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^ ! ABSTRACT 

■ The Shear TEsting Programme (STEP) is a collaborative project to improve the accuracy 
■ " " ' and reliability of weak lensing measurement, in preparation for the next generation of wide- 
field surveys. We review sixteen current and emerging shear measurement methods in a com- 
mon language, and assess their performance by running them (blindly) on simulated images 
that contain a known shear signal. We determine the common features of algorithms that most 
successfully recover the input parameters. A desirable goal would be the combination of their 
best elements into one ultimate shear measurement method. In this analysis, we achieve pre- 
viously unattained discriminatory precision via a combination of more extensive simulations 
and pairs of galaxy images that have been rotated with respect to each other That removes the 
otherwise overwhelming noise from their intrinsic ellipticities. Finally, the robustness of our 
simulation approach is confirmed by testing the relative calibration of methods on real data. 

Weak lensing measurement has improved since the first STEP paper. Several methods 
now consistently achieve better than 2% precision, and are still being developed. However, 
we can now distinguish all methods from perfect performance. Our main concern continues 
to be the potential for a multiplicative shear calibration bias: not least because this can not 
be internally calibrated with real data. We determine which galaxy populations are respon- 
sible and, by adjusting the simulated observing conditions, we also investigate the effects of 
instrumental and atmospheric parameters. We have isolated several previously unrecognised 
aspects of galaxy shape measurement, in which focussed development could provide further 
progress towards the sub-percent level of precision desired for future surveys. These areas in- 
clude the suitable treatment of image pixellisation and galaxy morphology evolution. Ignoring 
the former effect affects the measurement of shear in different directions, leading to an overall 
underestimation of shear and hence the amplitude of the matter power spectrum. Ignoring the 
second effect could affect the calibration of shear estimators as a function of galaxy redshift, 
and the evolution of the lensing signal, which will be vital to measure parameters including 
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1 INTRODUCTION 

The observed shapes of distant galaxies become slightly distorted 
by the (differential) gravitational deflection of a light bundle as it 
passes near foreground mass structures. Such "cosmic shear" hap- 
pens regardless of the nature and state of the foreground mass. 
It is therefore a uniquely powerful probe of the cosmic mass 
distribution, dominated by dark matter. Observations of gravita- 
tional lensing are directly and simply linked to theories of struc- 
ture formation that are otherwise ill-equipped to predict the dis- 
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which affects optical sur veys, nor by unknown physics of distant 
supemovg (e.g. ^ Hillebrandt & Niemever 2000; Jam es et al. 2006 



ISullivan et M l200d : iTravaglio. Hillebra ndt. & Re ineckd 
nor by the uncertain relations between the mass of galaxy 
clusters and their observable X-ray luminosity or tempera- 
ture (e.g. i Huterer & White"2003'; ' pierpaoli. Scott & White! I2OO1L 
IViana. Nkhol & Liddle.2002) . Gravitational lensing is a purely ge- 
ometric effect, requiring knowledge of only deflection angles and 
distances. By directly observing the growth of the mass struc- 
tures over cosmic time, and by investigating the large-scale geom- 
etry of the universe, it is also an effective probe of dark energy 
jSemboloni et a/.ll2006at iHoekstra et a/.ll200l: Ijarvis et flZll2006l: 
ISchimd et al .l200d) and can tes t alternative theories of gra vity that 
move beyond general relativity iWhite & Kochanekl200lh . 

The practical use of weak lensing in cosmology effectively 
began with the simultaneous detectio n of a coherent cosmic 
shear signal by four independent groups iBacon. Refresier & Ellis! 
2000!:!Kaiser. Wilson & LuDpindl200a!Van Waerbeke et a/.!!2000!: 



Wittman et al Since then, the field of weak lensing has ad- 

vanced dramatically. Large, dedicated surveys with ground- and 
space-based telescopes have recently measured the projected 2D 
power spectrum of the large-scale mass distribution and drawn 
competitive constraints on the matter density parameter Q.m and 
the ampUtude of the matter power spectrum as jMao li et al. 200jj; 
Rhodes et a/I !200lt !Van Waerbeke et al\ !200lL !Hoekstra et M 
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2OO3). The results from these efforts are found to be in broad agree- 
ment and are rapidly becoming more credible, with the most recent 
publications presenting several different diagnostic tests to deter- 
mine the levels of systematic error. Ambitious plans are being laid 
for dedicated telescopes both on the ground {e.g. VST-KIDS, DES, 
VISTA darkCAM, Pan-STARRS, LSST) and in space (e.g. DUNE, 
SNAP, IDEM). Indeed, future weak lensing surveys were recently 
identified as the most promising route to understanding the nature 
of dark energy by the joint NSF-NASA-DOE Astronomy and As- 
trophysics Advisory Committee (AAAC) and NSF-DOE High En- 
ergy Physics Advisory Panel (HEPAP) Dark Energy Task Force^. 
The importance of weak lensing in future cosmological and astro- 
physical contexts seems assured. 

However, the detection and measurement of weak gravita- 
tional lensing presents a technical challenge. The ~ 1% distor- 
tion induced in the observed shapes of galaxies is an order of 



magnitude smaller than their typical intrinsic ellipticities, and a 
similar factor smaller than the spurious shape distortions created 
by convolution with the telescope's point spread function (PSF). 
Correction for these effects is crucial and complex. To test the 
reliability of weak lensing measurements, it has therefore been 
necessary since the first detections to manufacture simulated im- 
ages t hat closely resembl e real data but con tain a known shear 
signal . bacon ef oil i200l!) . !Erben et al\ !200l!) and !Hoekstra et al\ 
j2002h ran their shear measurement methods on such images. 
By comparing the input and mean measured shears, they deter- 
mined the calibration error inherent to each technique, and in some 
cases discovered (and hence corrected) a multiplicative calibration 
bias. This is most important because it cannot be self-calibrated 
from a survey itself. Other systematics can be checked for in real 
data via correlatio n of the galaxies and the PSF, or via an E- 
B decomposition ("Schneider ef a/.! !2002!; !Crittenden 670/1 !2002!; 
[Schneider & Kilbinger 2006). These early tests determined that 
the first successful shear measurement methods were accurate to 
s; 10% of the signal. 

To maximise progress in this technical field, and to foster 
the exchange of data and theoretical knowledge within the weak 
lensing community, we launched the Shear TEsting Pr ogramme 
(STEP). In the first STEP paper, jHevmans et fl/l!2005l STEPl), 
we parametrized the performance of methods in terms of their mul- 
tiplicative shear calibration bias m, an additive residual shear offset 
c and, in some cases, a nonlinear responsivity to shear q. That anal- 
ysis confirmed that the main difficulty in weak lensing lies in the 
calibration of the shear signal, but encouragingly showed that all 
of the methods used on existing weak lensing surveys achieve bet- 
ter than ~ 7% accuracy. Shear measurement error is therefore not 
currently a dominant source of error. 

Unfortunately, this accuracy will not be sufficient to realise the 
potential of the ambitious and much larger future surveys. STEPl 
found that the most accurate shear measurement methods were suc- 
cessfully calibrated to within a few percent, but the limited size and 
precision of the first STEP simulations forbade any finer analysis 
than this. The morphologies of galaxies in the first simulated im- 
ages were also overly simplistic, in a way that did not fully test 
the assumptions of some shear measurement methods that galaxies 
lack substructure and complex shapes. 

In this second STEP paper, we include complex galaxy mor- 
phologies and conduct a more precise test of current and develop- 
ing shear measurement algorithms to the ^ 0.5% level. We achieve 
this precision through the combination of a more extensive set of 
simulated images and an ingenious use of galaxy pa irs rotated with 
respect to each other iNakaiima & Bernsteinll2006l) . This removes 
the otherwise dominant noise from galaxies' intrinsic ellipticities. 
The new set of simulated images has also been designed to span a 
wide range of realistic observing conditions and isolate several po- 
tentially challenging aspects of shear calibration in which the accu- 
racy of shear recovery may begin to deteriorate. The data set is suf- 
ficiently large for it to be divided into different simulated observing 
conditions and for independent tests to be carried out within each. 
We thereby test the effects of the following parameters on shear 
measurement precision; 
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• Complex galaxy morphology 

• Galaxy size 

• Galaxy magnitude 

• Selection effects related to galaxy ellipticity 

• Direction of the shear signal relative to the pixel grid 

• PSF size 

• PSF ellipticity 

Sixteen different shear measurement codes have been run 
on the simulated images. These can be categorised into four dis- 
tinct categories. We provide a brief description of each algo- 
rithm, and outline the relative successes of each method. The 
STEP programme has dramatically s ped the development of 
new shear measurement methods (e.g. |Refre gier&_Baco3j2003j; 



Bernstein & Jarvis 2002; Mass ev & Refre gier 2005: Kuiiken 2006; 
Nakaiima & Ber nstein 2006, Bridle etal. in preparation), and we 
particularly focus on these. However, these methods necessarily re- 
main experimental, and development continues. The results from 
such methods should therefore be taken as an indication of progress 
rather than a judgement on their ultimate potential. 

This paper is organised as follows. In ij2| we describe the sim- 
ulated images. In ij3| we review the different shear measurement 
methods used by each author, translating them into a common lan- 
guage for ease of comparison, and categorising them into four dis- 
tinct groups. In Sj4| we compare each author's measured shear with 
the input signal, and split the simulations in various ways to isolate 
areas of potential difficulty in shear measurement. Because of the 
number of different methods used, this is a rather daunting process. 
In S|5] we provide some perspective on the results, assessing the 
relative performance of the different methods, and the categories 
of methods. In JS] we derive some general conclusions and outline 
suggestions for future development. 



2 SIMULATED IMAGES 



We have used the lMassev et fl/.li2004ah simulation package to man- 
ufacture artificial images that closely resemble deep r-band data 
taken in good conditions with the Suprime-Cam camera on the 
Subaru telescope. We specifica lly mimic the weak lensing survey 
data of lMivazaki et al\ J2002d) . The Subaru telescope was built 
with careful consideration of weak lensing requirements, and has 
reliably obtaine d the highest quality weak lensing data to date 
iMivazaki et a/.l2002a ; iWittmani2005L Kasliwal et al. in prepara- 
tion). It therefore represents the current state-of-the-art, and will 
most closely match future dedicated survey instruments. The sim- 
ulated images are publicly available for download from the STEP 
website^ . 

To aid the interpretation of our results, the simulated images 
incorporate several "unrealistic" simplifications; neither the noise 
level, the input shear signal nor the PSF vary as a function of po- 
sition. This does not adversely affect the validity of the results, as 
any combination of PSF size, PSF ellipticity, and shear signal can 
usually be found in one of the images. However, it does let us sim- 
ply average the measured shear for the large number of galaxies in 
each image, without explicitly keeping track of either the shear or 
PSF applied to each object. As in STEPl, the main figure of merit 
throughout our analysis will be the mean shear measured within 
each image, (7), and deviations of that from the known input shear 
If the mean shear can be determined without bias for any 
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input 



Image set 


PSF description 


Galaxy type 


A 


Typical Subaru PSF (~ 0.6") 


shapelets 


B 


Typical Subaru PSF (~ 0.6") 


pure exponential 


C 


Enlai-ged Subaru PSF (~ 0.8") 


shapelets 


D 


Elliptical PSF aligned along x-axis 


shapelets 


E 


Elliptical PSF aligned at 45° 


shapelets 


F 


Circularly symmetric Subaru PSF 


shapelets 



Table 1. The six different sets of images used in the STEP2 analysis ai'e 
carefully chosen to isolate and test pai'ticular aspects of weak shear mea- 
surement. Either the PSF shape, or the form of galaxies' intrinsic morpholo- 
gies varies in a prescribed way between sets. 



input shear (and for any PSF), all of the commonly-used statistics 
typical in cosmic shear analysis should also be unbiased (but the 
distribution of the shear estimates will affect their noise level). 

To address the specific topics outlined in the introduction, we 
manufactured six sets of simulated images. These span a range of 
realistic observing conditions, in a carefully orchestrated way that 
will isolate various effects. The differences between the images are 
described in table Q Each set contains 128 7' x 7' images, with 
a pixel scale of 0.2". In the first simulated image of each set, the 
galaxies are not sheared. For the next 63 images, which all feature 
the same patch of sky in order to maximise sensitivity to shear cali- 
bration, the galaxies are sheared by a random amount. This amount 
is chosen with a flat PDF within I^'^p"*] < 6%. To concentrate 
on cosmic shear measurement rather than cluster mass reconstruc- 
tion, this limit is smaller than the maximum shears used in STEP I. 
However, the shears are now crucially chosen from a continuous 
distribution and are allowed to be in any direction relative to the 
pixel g rid. Note that we are real ly attempting to measure "reduced 
shear' ' JSeitz & SchneideJl997l) throughout this analysis, although 
there is explicitly zero convergence in the simulations. The input 
signals were not disclosed to any of the groups analysing the data. 

We can predict the signal to noise ratio in the shear measure- 
ment from these images. We first define a complex ellipticity for 
each galaxy 

e = ei+ie2= '^—^ { cos (261) + i sin (261)) , (I) 
a + 

where a and b are the major and minor axes, and 6 is the orienta- 
tion of the major axis from the a;-axis. This definition is widely used 
because it is more convenient than a two-component parametriza- 
tion involving 9. Both the real and imaginary parts are well-defined 
(zero) for a circular object or, on average, for an unsheared popu- 
lation of objects. In the absence of PSF smearing and shear mea- 
surement errors, the observed galaxy ellipticity e°''^ is related to its 
intrinsic ellipticity e'"* by 

obs e'"' + 7 



1 + 7*e'"* 

jSeitz & SchneideJl997h . where 7 = 71 + 172 is the complex shear 
applied to each image. With only a finite number A'^ of galaxies, all 
with nonzero intrinsic ellipticity, measurement of the mean shear 
{7) = {e°'"^) is limited by an intrinsic shot noise 



SN error 



*) =0± 



N 



(3) 



' |http : / /www .physics ■ ubc ■ ca/~heymans/step . html| 



In the STEP2 simulations, \/{ef) ~ 0.1, about an order of magni- 
tude larger than the shear signal. 

Since the morphologies of the simulated galaxies are uncorre- 
lated, this noise can be slowly beaten down by increasing the size 
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of the simulations. But to dramatically improve the efficiency of the 
simulations, and circumvent the meagre 1 / \/7V behaviour, we in- 
troduce an innovation in the remaining 64 images. Following a sug- 
gestion in Nakajima & Bernstein ( 2006.), the entire sky, including 
the galaxies, was artificially rotated by 90° before being sheared by 
the same signals and being convolved with the same PSF as before. 
This rotation flips the sign of galaxies' intrinsic ellipticites. To mea- 
sure biases in shear measurement methods, we can then consider 
matched pairs of shear estimators from the unrotated and rotated 
version of each galaxy. Averaging these estimators explicitly can- 
cels the intrinsic shape noise, leaving only measurement noise and 
any imperfections in shear measurement. We thus form a shear es- 
timator for each galaxy pair 

~ / obs.unrot . obs,rot\ /r» / a\ 

7 = (e + e ' )/2 . (4) 

Since e'"''""'"' = e'"' = _e'"'''°*, we can use equation 0} to find 



Averaging this shear estimator over N/2 galaxy pairs now gives a 
shot noise error in (7) of 

SN error ^ -i{{^rf) = ± 7^/MS , (6) 

which has been significantly reduced from equation ^3}- In the 
STEP2 simulations ~ 0.05 and I7I < 0.06. Nothing 

is lost by this approach. All 128 images can still be analysed inde- 
pendently - and we do pursue this approach in order to measure 
the total shape measurement noise in an ordinary population of 
galaxies. 

The lMassev et al\ i2004ah image simulation pipeline required 
extensive development from previously published versions to 
mimic ground-based data. We shall therefore now describe its three 
main ingredients: stars {i.e. PSF), galaxies and noise. 
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Figure 1. The point spread functions (PSFs) used to generate the six dif- 
ferent sets of simulated images. The colour scale is logarithmic, and the 
contours, which are overlaid at the the same absolute value on each PSF, 
are spaced logarithmically by factors of two. They are designed to target 
specific aspects of weak lensing measurement that could potentially prove 
difficult to control. See table ITIand the text for a description of each PSF. 



2.1 Stars 

The simulated images are observed after convolution with a 
various point-spread functions (PSFs). The PSF shapes are 
modelled on real stars observed in Suprime-Cam images, and 
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basis functions that can be used to describe the shape any isolated 
object. The decomposition of an image into shapelet space acts 
rather like a localised Fourier transform, with images /(x) being 
expressed in shapelet space as a set of indexed coefficients /„,m 
that weight the corresponding basis function 

00 n 

■''W = E E .fn..mXr.Mr,0;P) , (7) 

n— m— — n 

with m ^ n, and where the Gauss-Laguerre basis functions are 

Xn...ir,e;f3) = ^^-j L^^^-je^^ e ,(8) 



with a normalising constant Cn,m and scale size /3. 

The PSFs can therefore take a complex form. They contain 
substructure, skewness and chirality. In general, the ellipticity of 
their isophotes varies as a function of radius. For computational 
efficiency, the shapelet series is truncated at order nmax = 12. The 
limited wings and the rapid convergence of the PSFs to zero at large 
radii compared to those used in STEP! is not a consequence of this 
truncation, but a confirmation of the excellent optical qualities of 
Suprime-Cam. 

PSF A is modelled from a fairly typical star towards the centre 
of a 40 minute long Suprime-Cam exposure (which, in practice is 
likely to be assembled from four 10 minute exposures). It has a full- 
width at half-max (FWHM) of 0.6". PSF B is identical to PSF A. 
PSF C is the same star, but enlarged to model slightly worse seeing, 
and has a FWHM of 0.8". This is the worst that might be expected 
in future weak lensing surveys, with nights during poorer condi- 
tions typically used to obtain data in additional colours. PSF D is 
modelled on a star at the edge of the same Suprime-Cam exposure. 
The phases of all of its m = 2 shapelet coefficients were adjusted 
to the same value so that at all radii (and therefore with any radial 
weight function), its ellipticity derived from quadrupole moments 
points in exactly the same direction. Substructure and skewness ap- 




Figure 2. A 1' x 1' section of a simulated image from set A, containing 
shapelet galaxies with complex morphologies. The colour scale is logarith- 
mic, and the same as that in figurel3l 

parent in the real Subaru PSF is otherwise untouched. As PSF D, 
the ellipticity is directed parallel to the a;-axis of the pixel grid. The 
star is rotated by 45° to make PSF E. It is an example of extreme el- 
lipticity, which highlights ellipticity-dependent effects. However, it 
might be possible to limit such ellipticity in weak lensing surveys 
by improving the optical design of future telescopes or optimis- 
ing survey tiling and scheduling strategies. PSF F is a circularised 
version of that star, obtained by setting all of its m 7^ shapelet 
coefficients to zero, which is equivalent to averaging the PSF over 
all possible orientations. 



2.2 Shapelet galaxies 



Most of the simulated images contain galaxy shapes also con- 
structed from weighted com binations of the shap elet basis func- 
tions, using a version of the lMassev et al\ J2004ah image simula- 
tion pipeline similar modified to imitate ground-based data. The 
complex and iiTegular galaxy morphologies that are possible using 
this method represent an important advance from the STEPl anal- 
ysis u sing the SkyMaker image simulation package iErben et al\ 
l200lh . The measurement of weak lensing in STEPl was consider- 
ably simplified by the galaxies' smooth and unperturbed isophotes. 
Several shear measurement methods are based on the assumption 
that galaxy shapes and the PSF are concentric, elliptical, and in 
some cases Gaussian. In addition, the SkyMaker galaxies have 
reflection symmetry about the centroid which could feasibly cause 
any symmetrical errors to vanish. By contrast, PSF correction 
and galaxy shape measurement are rendered more challenging in 
STEP2 by the realistic morphologies that include spiral arms, dust 
lanes and small-scale substructure. Our analysis is thus designed to 
test the robustness of weak lensing measurement methods. 
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Figure 3. A 1' x 1' section of a simulated image from set B, containing 
idealised galaxies with exponential radial profiles and simple morphologies. 
The colour scale is logarithmic, and the same as that in figureOl 

The joint size-magnitude-morphology distribution of galaxies 
was copied from the Hubble Space Telescope COSMOS survey 
(Scoville et al. in preparation). This is a uniform, two square de- 
gree set of images taken with the F814W filter on the Advanced 
Camera for Surx'eys (ACS), to a depth of 28.7 for a point source 
at 5cr. It is deeper than our intended simulations, and with a much 
finer resolution, so provides an ideal source population. The ex- 
tent of the COSMOS survey also provided sufficient real galaxies 
to avoid duplication in the simulations without needing to pe rturb 
shapelet coefficients, as in section 4 of iMassev et al\ i2004ah . We 
simply used the shapelet models of COSMOS galaxies, randomly 
rotated, inverted and repositioned. The positions of galaxies in the 
simulations were chosen at random, without attempting to repro- 
duce higher-order clustering. 

Since the galaxy models are inevitably truncated at some level 
in shapelet space, and since we did not deconvolve the galaxies 
from the ACS PSF, the smallest simulated galaxies are intrinsically 
slightly rounder than those in real Subaru data. However, this con- 
volution occurs before shearing and does not alter the necessary 
steps for shear measurement. As in real data, the simulated galaxy 
ellipticity and morphology distributions do vary with galaxy mag- 
nitude and size. We adopt an alternative definition of ellipticity 

(^1' = ( cos {26), sin (2^)) , (9) 

where a and b are the major and minor axes, and 6 is the orien- 
tation of the major axis from the x-axis. Note the difference from 
equation Q; this version is closer to the notation used by most 
shear estimators. Before PSF convolution, the width of this ellip- 
ticity distribution 

int // int\2 . / int\2\l/2 / 1 n\ 

as measured by SEXTRACTOR iSertin & AmoutJlQQ^) is at"' = 
0.35 ± 0.03 at r- = 22 and = 0.20 ± 0.02 at r = 26. Note 
that this e is a different quantity than the e used in equation J3}. 
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The galaxies wer e then sheared analytically in shapelet space, 
using equation (41) of iMassev & Refreeied i2005h . This operation 
is to first order in 7. Terms of order 7^^ are ignored, but, for typi- 
cal galaxy shapes, the coefficients by which these are multiplied are 
also smaller than those multiplying the first order terms. This there- 
fore introduces only a very small error. The galaxies were then con- 
volved with the PSF, also in shapelet space, using equation (52) of 
iRefr eeier (2003!). They were pixellated by analytically integrating 
th e shapelet models within ad joining squares, using equation (34) 
of lMassev & RefregieJ f2005^ . 



2.3 Idealised galaxies 

We have also manufactured one set (B) of simulated images with 
the same observing conditions but in which the galaxies have 
simple, exponential profiles and concentric, elliptical isophotes. 
These idealised galaxies provide a contrast to the morphological 
sophistication of the shapelet galaxies, and an independent test of 
the shapelet-based shear measurement methods. We intentionally 
chose a very simple form for the idealised galaxy shapes, with a 
sharp cusp and extended wings, to most effectively pronounce any 
difference to the results from galaxies with realistically complex 
morphologies. As before, the size-magnitude distribution of un- 
sheared galaxies was modelled on that observed in the ACS COS- 
MOS images. Galaxy ellipticities were assigned randomly from a 
Gaussian distribution. Like STEPl, we used a constant distribution 
of intrinsic ellipticity. This had width ct^"' =0.3 for galaxies at all 
magnitudes. 

To add a shear signal, the random ellipticities are then per- 
turbed at the catalogue level. Under a small shear 7^, the ellipticity 
£ defined in equation ^9) transforms as 



where denotes transpose and the rotation matrix 



obs iiit 1 o/r iiit int\ . 3\ 

£i =ei +2{5ij-£i Ej )7j+C'(7), 



(11) 



(e. g. iRhodes et al\ <2000l) ) where Sij is the Rroneker-delta symbol, 
and the summation convention was assumed. Similarly, the mean 
square radius d—a^ + b^ becomes 



d:\l + 2er^^) + 0{'y^) 



(12) 



These two expressions are valid up to first order in the shear. Note 
that, to this order, the flux F is unaffected by a pure shear. These 
results are valid for any galaxy with self-similar isophotes (as long 
as the moments converge). 

To create a simulated galaxy image /(x) with a desired ellip- 
ticity, we first specify the desired size vq and mean radial profile 
p(r^), where r'^ — xf + x% is the square radius and x = (si, 3:2) 
are Cartesian coordinates on the sky, centered on the centroid of the 
galaxy. For convenience, we choose the normalisation and angular 
scale of the generic profile such that 



p{r ) d X : 



r p{r^) d X = 1 



The exponential profile used in these simulations is given by 



p(r-2) = ^ e-V6(r/ro)2 

27rro 



(13) 



(14) 



(c.f. Refregier 2000 for the alternative case of a Gaussian profile). 
Using the conventions of equation <13> and a coordinate transfor- 
mation 



J = R(e)^( ° )R(e)=rf 



2 ^ 1 + £1 £2 

£2 1 - El 



cos 9 sin 6 



(16) 



it is then easy to show that the elliptical galaxy image should have 
surface brightness 



/(x) =F|jr5p(x^J~ix) 



(17) 



,(15) 



where the vertical bars denote the matrix determinant. The tails 
of their exponential profiles were artificially truncated at elliptical 
isophotes 5 x ro from the centre. To pixellate the galaxies, the value 
of the analytic function was computed at the centre of each pixel. 
The PSF was similarly pixellated, and convolution was then per- 
formed in real space to produce the final image /(x). Strictly, these 
operations should be reversed, and they do not commute. However, 
the pixels are small and the PSFs are Nyquist sampled, so the error 
introduced should be minimal. 

2.4 Noise 

A two-component noise model is then superimposed onto the im- 
ages. Instrumental performance mimics that attained with a stack 
of four te n-minute exposures with Suprime-Cam on the 8m Subaru 
telescope iMivazaki e t a/. '2002b'). They are complete to r — 25.5, 
and the galaxies selected for lensing analysis are likely to have a 
median redshift Zm ~ 0.9. This is slightly deeper than most exist- 
ing weak lensing surveys, and is towards the deep end of ground- 
based surveys planned for the future. The number density of use- 
able galaxies found in these simulated images is therefore unlikely 
to be greatly surpassed. 

The first component of "photon counting" shot noise is first 
added to the true flux in every pixel. This is drawn from a Gaus- 
sian distribution with a width equal to the square root of the photon 
count. The images are then renormalised to units of counts per sec- 
ond. In the renormalised images, the rms of the Gaussian is 0.033 
times the intensity in a pixel. 

A second component of sky background is then added 
throughout each image, with an rms of 4.43 counts per second. The 
DC background level is assumed to be perfectly subtracted. The 
model Subaru images were combined using DRIZZLE, and the sky 
background noise is correlated in adjacent pixels. To mimic this 
effect, we smoothed the sky noise component (but not the flux in 
objects) by a Gaussian of FWHM 3.5 pixels. After this process, the 
rms of the sky noise is 1.65 counts per second. A simulated image 
of a completely blank patch of sky was also available to measure 
the covariance between pixels. The correlated noise particularly af- 
fects the detection of small, faint objects, and impedes the calcula- 
tion of objects' weights from their detection S/N. It will be instruc- 
tive in the future to consider which image resampling kernels and 
co-addition methods are optimal for sh ape measurement, o r indeed 
whether we should stack the data at all. ljarvis et al\^GQ'^ suggest 
measuring galaxy ellipticities on individual frames and combining 
these at the catalogue level. Note that faint simulated galaxies are 
created to the depth of the COSMOS survey, below the limiting 
magnitude of the simulated ground-based images, and these unre- 
solved sources will also add slightly to the overall sky background. 



3 SHEAR MEASUREMENT METHODS 

Sixteen different shear measurement codes have been run on the 
simulated images, by the authors listed in table |2| Those that have 
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Author 


i^ey 


ivicinou 


Berge 


JB 


Shapelets (Massey & Refregier 2005) 


Clowe 


CI 


KSB+ (same PSF model used for all galaxies) 


Clowe 


C2 


KSB+ (PSF weight size matched to galaxies') 


Hetterscheidt 


MH 


KSB+ 


Hoekstra 


HH 


KSB+ 


Jarvis 


MJ 


Bemstein & Jarvis (2002) 


Jarvis 


MJ2 


JJCI 11l)LL/I11 l3C JtllVI>i \Z.\.J\J^J \IICVV WClHllllllg 3L11C111C/ 


Kuijken 


KK 


Shapelets (Kuiiken 2006) 


Mandelbaum 


RM 


Reglens (Hirata & Seliak 2003; 


Nakajima 


RN 


Bernstein & Jarvis (2002) (deconvolution fitting) 


Paulin-Henriksson SP 


KSB+ 


Schirmer 


MSI 


KSB+ (scalar' shear susceptibility) 


Schirmer 


MS2 


KSB+ (tensor shear susceptibility) 


Schrabback 


TS 


KSB+ 


Semboloni 


ESI 


KSB+ (shear susceptibility fitted from population) 


Semboloni 


ES2 


KSB+ (shear susceptibility for individual galaxies) 



Shear measurement method 



been used elsewhere on real data, attempt to preserve as similar 
a pipeline as possible. Each method must first find and measure 
the shape of stars in each image. It must interpolate the PSF shape 
across the field, without assuming that it is constant. It must then 
find and measure the shapes of galaxies, correcting them appropri- 
ately for the effects of seeing. Note that we still consider object 
identification and classification to be part of a shear measurement 
method, as shape bia ses can easily be introduced at this point (e.g. 
iBemstein & Jarvisl2 Q02: Hirata & Seliak 2003); however, that task 
is likely to be separated in future STEP projects. 

All of the methods work by obtaining, for each galaxy, a two- 
component polarisation Si that behaves like a generalised elliptic- 
ity. Precise definitions of polarisation vary between methods, but it 
is important to note that easily measurable quantities do not usually 
change linearly with applied shear, so that (e) ^ -y'^P"' for all val- 
ues of I'^P"'. To obtain an unbiased shear estimator, methods must 
determine how their polarisations change under an applied shear, 
and compute either a shear susceptibility tensor P^^ = Ssi/S^j or 
a shear responsivity factor TZ. These are essentially interchangeable 
concepts, but with the word "susceptibility" used to imply measure- 
ment from the higher order shape moments of each galaxy (which 
are then often averaged or fitted across a galaxy population), and 
the word "responsivity" to mean an average susceptibility for the 
population, measured from moments of the galaxy ellipticity dis- 
tribution. In either case, this quantity can be inverted, and used to 
form a shear estimator 



7 = {PI 



TZ 



(18) 



(19) 



When computing the mean shear from a limited subset of galax- 
ies, such as those in one size or magnitude bin, we shall investigate 
two approaches to the calculation of TZ. We try using the constant, 
global value, as has been done in published work, and we also try 
calculating TZ from the statistics of the smaller population. The lat- 
ter is more noisy, but takes into account the evolution of galaxy 
morphology between samples (see il5.5> . 

In table|3| the methods are broadly distinguished by their solu- 
tions to the two most important tasks in shear measurement. Some 
methods correct for the PSF at the catalogue level, by essentially 



Table 2. Table of authors and their shear' measurement methods. The key 
identifies the authors in all future plots and tables. 





Passive 


Active 


Subtraction 


KSB+ (various) 
Reglens (RM) 
RRG* K2K* 
Ellipto* 


BJ02(MJ, MJ2) 1 


Deconvolution 


Shapelets (JB) 


Shapelets (KK) ■ 
BJ02 (RN) 1 
im2shape* 1 



Table 3. Broad classification scheme to distinguish different types of shear 
measurement methods. Asterisks denote methods not tested in this paper. 
The top-left quadrant is red; the top-right blue; the bottom-left orange; and 
the bottom-right green. 



subtracting the ellipticities of the PSF from that of each galaxy; 
others attempt to deconvolve each galaxy from the PSF, and mea- 
sure the ellipticity of a reconstructed model. To obtain a polari- 
sation, some ("passive") methods measure combinations of galax- 
ies' observed shape moments; other ("active") methods shear a 
model of an intrinsically circular source until it most closely re- 
sembles the observed galaxy. We shall now provide a brief de- 
scription of each method, starting in the top-left quadrant of ta 
ble|3l Since the STEP program has dramatic ally sped the devel 
opment of new shear measurement methods jRefregier.&,Bacor 



| 2003t | Bemstein & Jarvisl2002l;lMassev & Refregieil2005l;lKuii'keBl 

l200g;*Nak aiima & Ber nstein 200q|, Bridle et al. in preparation), we 
shall particularly concentrate on the latest developments in those 
algorithms. 

3.1 Red class methods 

3.1.1 KSB+ {CI, C2. MH. HH, SP, MSI, MS2, TS, ESI and ES2) 

The shear measureme n t method developed by 
iKaiser. Squires & BroadhurstI jl995h . iLuppino & Kaiseil j 19971) 
and lHoekstra et al\ \\99^ is in widespread use by many current 
weak lensing surveys. This has led to a high level of optimisation 
of the basic method. The base IMCAT code is publicly available 
from the world wide web''. Many variations have been developed, 
and the ten implementations tested in this paper represent a cross- 
section of those that have been applied to real data. The details of 
each method are compared fully in the appendix of STEPl. The 
differences that STEP2 results reveal to be particularly significant 
are summarised again in table|4] 

The core of the method requires the measurement of 
the quadrupole moments of each observed galaxy image /(x) 
weighted by a Gaussian of size r-g. From these are formed a po- 
larisation 



where 



W(x) 



// 7(x) VF(x) ( cos {29) , sin (26)) d^x 
///(x) W(x) r-2 d^x 



(20) 



(21) 



The polarisation is corrected for smoothing of the PSF via the 



' |http : / / www ■ if a ■ hawaii ■ edu/~kai5er/imcat| 
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smear susceptibility tensor P^™ and calibrated as shears via the 
shear polarisability tensor P°'^: both of which involve higher or- 
der shape moments. Using stars to denote measurements from stars 
(for which a smaller weight function is sometimes used) instead of 
galaxies, these form a shear estimator 



7 = 
where 



p7 _ psh _ psm /psm+N-l psh* 



(22) 



(23) 



The tensor inversions can be performed in full, but these measure- 
ments of faint objects are particularly noisy. In practice, since the 
diagonal elements of are similar, and its off-diagonal elements 
are about an order of magnitude smaller, it can be approximated as 
a scalar quantity. Many implementations of KSB-l- therefore simply 
divide by a shear susceptibility factor. The noise in P'' is also some- 
times reduced by fitting it from the entire population as a function 
of other observable quantities like galaxy size and magnitude. Re- 
ducing noise in any nonlinear aspect of shear measurement is vital, 
because the lensing signal is so much smaller than both the intrinsic 
ellipticity and photon shot noise, and must be obtained by linearly 
averaging away those sources of noise over a large population of 
galaxies. 

Unfortunately, fundamental limitations in the mathematical 
formalism of KSB-l- introduce further decisions that must also be 
resolved to approximate an ideal scenario in practical implemen- 
tations. The KSB-l- method makes no provision for the effects of 
pixellisation; assumes that the PSF isophotes are concentric; and 
is mathematically ill-defined for non-Gaussian or non-concentric 
PSF and galaxy profiles. The various implementations developed 
by groups participating in the STEP2 analysis represent a cross- 
section of those choices. 

Since STEPl, the TS method has incorporated a shear cali- 
bration factor of 0.91~^, determined from the STEPl results, but 
without knowledge of the STEP2 data. STEP2 therefore tests the 
robustness of this sort of calibration. As in STEPl, the CI and 
C2 methods incorporate a calibration factor of 0.95^^ to eliminate 
the effect of close galaxy pairs. The CI method uses a constant 
model of the PSF for all galaxies; the C2 method lets the size of the 
weight function r* = Vg change to match each galaxy. The new 
SP method numerically integrates weight functions within pixels, 
uses the trace of P'' from individual galaxies, and similar galaxy 
weights to the HH method. The ESI method is based upon the LV 
method from STEPl but, rather than fitting the shear susceptibility 
from the galaxy population as a function of size and magnitude, it 
finds the twenty most similar galaxies in terms of those parame- 
ter s, and uses their average v alue. This same procedure was used in 
the lSemboloni et analysis of the CFHTLS deep survey. 

Subsequent tests on STEPl images suggested that better results 
could be obtained by using individual measurements of P'' from 
each galaxy, and ignoring the galaxy weights. These improvements 
have been incorporated into the new ES2 method. 

One final finesse is required for methods that use weights Wi 
on each galaxy i that could vary between the rotated and unro- 
tated images. For all A'^ pairs of galaxies, we determine normalised 
weights 



Nwj 



(24) 



/ ~ unrot\ 1 \ ^ / un 



(25) 
(26) 



/ -\ 1 \ ^ / unrot/ obs.unrot , rot/ obs,rot\ /^-7\ 

(7> = e ' + w e • ).(27) 

Errors on these are estimated using a bootstrap technique. 



3.1.2 Reglens(RM) 

The Reglens (RM) method consi sts of two parts: the SDSS 
data processing pipeline PHOTO jivezic et fl/.1 12004). followed 
by the re-Gaussianiza tion pipeline jHirata & Seli^ l2003t 
iMandelbaum et a/.ll200^ . The magnitude cut was adjusted, and 
one additional subroutine was required for the STEP2 analysis, to 
properly determine the noise variance in the presence of correlated 
background noise. The STEP2 images are more crowded than 
SDSS images, leading to occasional deblending problems. Objects 
with failed deblending were automatically eliminated, after visual 
inspection indicated that nearly all of them were really several 
galaxies very close to each other. 

PSF correction is performed via a two-step procedure that ad- 
dresses KSB-l-'s limitation of being exact only in the limit of Gaus- 
sian PSF and galaxy profile. The PSF is first split into a Gaussian 
component G(x) plus a small residual £(x), so that the observed 
image 



I ^{G + e)(d f = G® f + e® f ., 



(28) 



where /(x) is the galaxy image before convolution of the PSF, and 
® signifies convolution. Assuming knowledge of /, it would be 
possible to find 



I' = G(g)f 



(29) 



the galaxy image as it would appear when convolved with a per- 
fectly Gaussian PSF. Although / is not known in practice, it is 
convolved with a small correction e in the final equality, so equa- 
tion <29t is fairly accurate even with an approximation fo. The 
SDSS and STEP2 analyses used an elliptical Gaussian as fo, with 
its size and ellipticity determined from the difference between the 
best-fit Gaussians to the observed image and the f ull PSF. Possible 
alterna tives to this approximation are discussed in lHirata'& SeliakI 
<2003h . 

Correction for the isotropic part of the now Gaussian PSF re- 
qires a subtraction similar to that in KSB-l- equation <22t . except 
that Reglens directly subtracts moments of the PSF from those of 
the galaxy (i.e. the numerator and denominator of equation <20t ) 
before they are divided (i.e. the ratio in equation <20» . Further- 
more, the moments are calculated using weight functions Wj/ (x) 
and Wg (x) that are the best-fitting elliptical Gaussians to the im- 
age and to the PSF respectively. The advantage of these adaptive 
weight functions is that they do not bias the shape measurement or 
require later correction. Correction for the anisotropic part of the 
Gaussian PSF is finally performed by shearing the coordinate sys- 
tem, including /', until G is circular. 

In the absence of galaxy weights, a shear estimate for each 
galaxy would be computed via equation <19> . The shear responsiv- 
ity 



7^ : 



/ 2 , 2 

(ei + £2 



(30) 



and then calculate three estimates of the mean shear in each image 



is calculated from shape distribution statistics of the entire galaxy 
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Author 


Pixellisation 


Galaxy weighting scheme 


CaP factor 


Shear susceptibiUty 




JB 


Analytic integration 


None 


— 


Global mean shear responsivity TZ = 2 ~ 


is') 


CI 


Centre of pixel 


min{u, 40) 


1/0.95 


iTr[P^], fitted as /(rg.Ei) 




C2 


Centre of pixel 


minlu, 40) 


1/0.95 


iTr[P^], fitted as /(r-j.Ei) 




MH 


Numerical integration 


1/(0.15 + <7,^+a{iTr[P^(r3)])^) 


1/0.88 


^Ti[P~'], from individual galaxies 




HH 


Numerical integration 


+ s?/((l - 4)^Tr[PT])2) 




(1 - 4^)4Tr[PT]. fitted as /(r„) 




MJ 


Centre of pixel 


1/,/ e2 1 2 25s2 




Global mean shear responsivity TZ 




MJ2 


Centre of pixel 


i/s'i 




Global mean shear responsivity TZ 




KK 


Centre of pixel 


1/(0.1' +al +0 




Global mean shear responsivity TZ = 1 — 


is') 


RM 


Centre of pixel 


fiS/N) 




Global mean shear responsivity TZ 




RN 


Centre of pixel 


l/\/e2 + 2.25^2 




Global mean shear responsivity TZ 




SP 


Numerical integration 


l/{0.15 + ai+a{^TrlP-'irg)]y') 




iTr[P^], Individual galaxies 




MSI 


Numerical integration 


l/cr^(rg,mag) 




|Tr[P^], fitted as /(r-j, mag) 




MS2 


Numerical integration 


l/a'i (rg,mag) 




Full P^ tensor, fitted as f{rg, mag) 




TS 


Numerical integration 


None 


1/0.91 


iTr[P^], from individual galaxies 




ESI 


Numerical integration 


l/(o-|(rg,mag) + 0.44^) 




iTr[P^], smoothed from galaxy population f{rg , mag) 


ES2 


Numerical integration 


None 




iTr[P^], from individual galaxies 





Table 4. Choices adopted by each of the shear measurement methods that significantly affect their performance in this paper. See the appendix in STEPl for 
more details about the differences between the various implementations of KSB+. 



population and the error on each polarisation, s^^ , is calculated by 
propagating measured photon shot noise in the image. During our 
analysis, it became apparent that, for the RM, MJ, MJ2 and RN 
methods, it is necessary to recalculate TZ in each bin of galaxy size 
or magnitude when the catalogue is so split. 

To improve the signal to noise, galaxies are each weighted by 
a factor 



(31) 



An estimate of the mean shear in each image is then simply 

<7> = E ™^ / E ™ ' (32) 
with a shear responsivity iBemstein & JarviJ2002h 



7^ : 



(2 - 2ko - ki\e[ 



E^ 



(33) 



where ko = a'^ — wa^ and fci = w^a"^. 

Note that this calculation of TZ in the STEP2 images is much 
more uncertain than in SDSS data, because the correlated back- 
ground noise in the STEP2 images is not as well understood. Con- 
sequently, this may introduce some bias into the STEP2 results that 
does not exist with the real data. 



3.1.3 Other methods not tested in this paper 

iRhodes et oZI <200(1 RRG) is a modification of the KSB-l- method 
for space-based data in which the PSF is small. In this limit, e* 
becomes noisy. Like Reglens, RRG therefore deals directly with 
moments rather than polarisations for as long as possible, and per- 
forms the subtraction before the division. The moments use a circu- 
lar weight function, and therefore require correction for this trun- 
cation as well as the PSF. RRG uses a global shear responsivity 

TZ ' ^ 2-{e^ ). 

iKaiseJ i2000l K2K) also seeks a resolution of the Gaussian 
PSF limitation in KSB+. The galaxy image is first convolved by an 
additional "re-circularising kernel", which is a modelled version of 
the observed PSF that has been rotated by 90° . PSF correction and 
shear measurement is thereafter fairly similar to KSB. However, 



particular efforts are made to correct biases that arise from the use 
of P'' measu red after shear rather than before shear. 

Ellipto iSmith. Bernstein. Fischer & Jarvisll200lh also uses a 
re-circularising ker nel to eliminate the ani sotropic component of 
the PSF, following jFischer&Tvsorill997h . It then repeats object 
detection to remove PSF-dependent selection biases. Galaxy polari- 
sations are derived from moments weighted by the best-fit elliptical 
Gaussian. It is a partial implementation of BJ02, discussed in the 
next section, and primarily differs from BJ02 by using a simpler 
re-circularising kernel. 



3.2 Blue class methods 

3.2.1 BJ02 (MJ and MJ2 ) 

The remaining methods are based upon expansions of the galaxy 
and PSF shapes into Gauss-Laguerre ("shapelet") basis functions. 
The JB and KK methods use them with a circular basis function, as 
defined in equations Q and js), while the MJ, MJ2 and RN meth- 
ods use more general elliptical versions. Shapelets are a natural ex- 
tension of KSB-l- to higher order. The first few shapelet basis func- 
tions are precisely the weight functions used in KSB-l-, with Tg rein- 
terpreted as the shapelet scale size j3. Generalised versions of th e 
P^*" and P^"^ matrices are derived in lRefregier & Bacori j2003h . 
Extending the basis set to higher order than KSB-l- allows complex 
shapes of galaxies and PSFs to be well described, even when the el- 
lipticity varies as a function of object radius. The shapelet basis set 
is mathematically well-suited to shear measurement because of the 
simple transformation of shapelet coefficients during typical image 
manipulation. 

The two Jarvis (MJ, MJ2) methods correct for the anisotropic 
component of the PSF by first convolving the image with an ad- 
ditional, spatially-varying kernel that is effectively 5x5 pixels. 
This convolution is designed to null both the Gaussian-weighted 
quadrupole of the PSF as well as its next higher m = 2 shapelet co- 
efficient (since it is the m — 2 components of the PSF that mostly 
affects the observed shapes of galaxies). For PSF ellipticities of 
order ~ 0.1 or less, a 5 x 5 pixel kernel is sufficient to round a 
typical PSF up to approximately 50 pixels in diameter; much larger 
than the PSFs used in this study. 
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The shapelet basis functions are sheared, to make them ellip- 
tical, then pixellated by being evaluated at the centre of each pixel. 
Shapelet coefficients fn.m = are determined for each galaxy in 
distorted coordinate systems, and the polarisability e is defined as 
— 1 times the amount of distortion that makes each object appear 
round {i.e. f2,2 = 0). Some iteration is required to get this mea- 
surement to converge. In the distorted coordinate frame where the 
galaxy is round, the weight function for this coefficient is a circu- 
lar Gaussian of the same size as the galaxy. Matching the shape 
of the weight function to that of the galaxy has the advantage that 
the polarisability no longer requires correction for truncation biases 
introduced by the weight function. 

Finally, a correction for the PSF dilution (the circularising ef- 
fect of the PSF) is applied by also transforming the PSF into this co- 
ordina te system, then using formulae proposed bv lHirata & SeliatJ 
j2003h . 

The two methods (MJ, MJ2) differ only in the weights applied 
to each galaxy. The MJ method is identical to the MJ method used 
for the STEP! study. It uses weights 



Ve^ + 2.25si ' 



(34) 



where So is the uncertainty in the polarisability due to image shot 
noise, as measured in the coordinate system where the galaxy is 
round. STEPl revealed that this optimised weight gave incorrect re- 
sponsivities as the input shear became large (~ 0.1). For this study, 
method MJ2 was therefore added, which is identical except that it 
uses weights that are not a function of the galaxies' polarisations 



WMJ2 



(35) 



These weights should be less biased for larger input shears. The MJ 
weight might be more appropriate for cosmic shear measurements, 
and the MJ2 weight for cluster lensing. 

The shear responsivity TZ for the MJ2 method is the same as 
that in equation <33> . For the ellipticity-dependent weight used by 
the MJ method, this is generalised to 

E U (2 - 2fco - fcilel") + ef^(l - fco - fci|e|')l 
7^ = — ^ ^ ^ , (36) 

where the summations are over the entire galaxy population, or for 
each size or magnitude bin. For either method, an estimate of the 
mean shear in each image is then 



Note that, in the absence of shape noise, equation <36> reproduces 
the extra (1 — e^/2) term multiplying in the HH implementation 
of KSB-l- (see tableEJ. 



3.3 Orange class methods 

3.3.1 Shapelets (JB) 

The Berge (JB) shear measurement method uses a parametric 
shapelet model to attempt a full deconvolution of each galaxy 
from the PSF. Deconvolution is an ill-defined operation in general, 
since information is irrevocably lost during convolution. In shapelet 
space, however, it is easy to restrict the galaxy model to include 
only that r ange of physical scales in w hich information is expected 
to survive. iMassev & RefreeieJ i2005h describes an iterative algo- 
rithm designed to optimise the scale size of the shapelets and to thus 



capture the maximum range of available scales for each individual 
galaxy. A complete software package to perform this analysis and 
shapelet manipulation is publicly available from the shapelets web 
site'*. 

To model a deconvolved galaxy shape, the basis functions are 
first convolved with the PSF in shapelet space, then integrated ana- 
lytically within pixels: thus undergoing the same processes as real 
photons incident upon a CCD detector. The convolved basis func- 
tions are then fit to the data, with the shapelet coefficients as free pa- 
rameters. Reassembling the model using wnconvolved basis func- 
tions produces a deconvolved reconstruction of each galaxy. This 
performs better than a Wiener-filtered deconvolution in Fourier 
space, because shapelets have a preferred centre. The available ba- 
sis functions act as a prior on the reconstruction, localising it in 
real space (and also allowing a slightly higher resolution at the cen- 
tral cusp than at large radii). The deconvolved model can also be 
rendered free of noise by ensuring that a sufficient range of scales 
are modelled to lower the residual Xrcduccd to exactly unity. Unfor- 
tunately, achieving exactly this target is hindered by the presence 
of correlated background noise in the STEP2 simulations. Incor- 
porating the noise covariance matrix is mathematically trivial but 
computationally unfeasible, and a practical implementation has not 
yet been developed. Proceeding regardless, the shape of this an- 
alytic model can b e directly measured (see iMassev et a/.(l2004d : 
iMassev et a/.l2006l) . including its unweighted moments. These can 
not be measured directly from real data because observational noise 
prevents the relevant integrals from converging. 

Once a deconvolved model is obtained, extraction of a shear 
estimator is easy. It could mimic the KSB method. However, re- 
moving the weight function (like the Gaussian in equation <20t ). 
makes the polarisation itself into an unbiased shear estimator 

///(x) (cos (261), sin (26*)) d^x 



// /(x) r2 d2x 



(38) 



The numerator of this expression has a shear susceptibilty equal 
to the denominator. But that denominator is a scalar quantity, with 
explicitly zero off-diagonal elements in the susceptibility tensor, 
which can therefore be easily inverted. It is also a simple product 
of a galaxy's flux and size, both low-order quantities that can be 
robustly measured. The method is intended to be completely linear 
for as long as possible, and to introduce minimal bias for even faint 
objects in this final division. Since the denominator also changes 
during a shear, a population of galaxies acquires an overall shear 
responsivity factor 



7^ = 2 - (e^) 



(39) 



The method is still under development. The shear respon- 
sivity factor has currently been calculated only from the entire 
galaxy population. No weighting scheme has yet been applied to 
the shear catalogue when calculating mean shears. Once galaxies 
have passed crude cuts in size, flux, and flags (which indicate suc- 
cessful convergence of the shapelet series and of the iteration), they 
are all counted equally. These aspets will be improved in the future. 

3.4 Green class methods 

3.4.1 Shapelets (KK) 

The Kuijken (KK) shear measurement method assumes that each 
galaxy was intrinsically circular, then shears it, and smears it by 

* |http : / /www, astro ■ caltech ■ edu/~rjm/ shape lets] 
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the PSF, until it most closely matches the observed image. The 
shear required is the stored as the polarisation e. As described in 
lKuiikenH200(ih . this approach is desirable, because it is understood 
precisely how a circular object changes under a shear. 

This process could operate in real space; however, the con- 
venient properties of shapelets make the required image manipu- 
lations easier and faster in shapelet space. The pixellated image 
need be accessed only once, when each galaxy is initially decom- 
posed into shapelets (without deconvolution). Models of circular 
sources can have arbitrary radial profiles, parametrized by shapelet 
coefficients with m = Q and n ^ 12. This is sheared in shapelet 
space to first order in 7, although, in principle, this could also be 
increased to accommodate more highly elliptical objects. Also in 
shapelet space, it is smeared by a model of the PSF. Since there 
is only one shapelet decomposition overall, and one forward con- 
volution for each object, the code is much faster than the Berge 
(JB) method. Furthermore, the decomposition uses completely or- 
thogonal shapelet basis functions, so the errors on shapelet coef- 
ficients are also uncorrelated at that stage. To avoid iterating the 
decomposition, the optimum scale size /? for each object is approx- 
imated from S Extractor parameters, and the range of scales is 
fixed in advance. In the current implementation, the basis functions 
are evaluated at the centre of each pixel. Since both the PSF and 
the galaxy are pixellated, its effects ought to drop out. In terms 
of the orthogonality of the shapelet basis functions, this approach 
is satisfactory as long so the range of scales is small, and oscilla- 
tions in the basis functions remain larger than the pixel scale {c.f. 
iBerrv. Hobson & Withingtorl2004 . 

To determine the shear required to make a circular source 
match each real galaxy, a fit is performed using a numerical recipes 
Newton-Raphson algorithm, which is quadratic in shapelet coeffi- 
cients, the centroid and the shear. Since the galaxies are not really 
all circular, in practice the global population does have a non-trivial 
shear susceptibility or "responsivity" TZ. For an ensemble popula- 
tion of galaxies, this is a scalar quantity. As can be deduced from 
equation <1 It . it involves the variance of the intrinsic polarisation 
distribution 



7^ = 1 



(40) 



Unlike other methods that use a shear responsivity correction, this 
quantity was calculated only once for the KK method, from the 
entire galaxy population. However, the calculation of (e^) properly 
takes into account the galaxy weights 



E w(ei + 62) 



,(41) 



where Se^ is the noise on each polarisation calculated by propagat- 
ing photon shot noise, and the weight for each galaxy is 



1 



(42) 



Note that the estimates of errors on the polarisations did not take 
into account the fact that the background noise was correlated be- 
tween adjacent pixels, and are therefore likely to be underesti- 
mated. 

Shear estimates for individual galaxies are then computed sim- 
ilarly to equation <37L but where 7 = e/TZ here. 

3.4.2 BJ02 (RN) 

The "deconvolution fitting method" by Nakajima (RN) implements 
nearly the full formalism proposed by BJ02, which is further elabo- 



rated in lNakaiima & BernsteinI ilOOdt) . Like MJ and MJ2, it shears 
the shapelet basis functions until they match the ellipticity of the 
galaxy. The amount of distortion that makes an object appear round 
(i-e. f2,2 = 0) defines the negative of its polarisability e. 

Since no PSF interpolation scheme has yet been developed, 
the pipeline deviates from the STEP rules by using prior knowl- 
edge that the PSF is constant across each image (but not between 
images). Deconvolution from the PSF is performed in a similar 
fashion to the JB method. The Gauss-Laguerre basis functions are 
convolved with the PSF to obtain a new basis set. These are eval- 
uated at the centre of each pixel. The new basis functions are fit- 
ted directly to the observed pixel values, and should fully capture 
the effect of highly asymmetric PSFs or galaxies, as well as the 
effects of finite sampling. The fit iterates until a set of sheared 
Gauss-Laguerre basis functions are obtained, in which the coeffi- 
cients /2,o = /2.2 = and hence the deconvolved galaxy appears 
round. All PSF coefficients were obtained to n ^ 12, and galaxy 
coefficients to n ^ 8. 

The weights applied to each galaxy are optimised for small 
shears, using the same prescription as the MJ2 method in equa- 
tion|35l The shear responsivity TZ is similarly calculated using |36l 
averaged over the entire galaxy population or within size and mag- 
nitude bins as necessary. 

The evolution of the RN method during the STEP2 analysis 
highlights the utility of even one set of STEP simulations. In the 
first submission, it was noticed that a few outlying shear estimates 
in each field were destabilising the result. These were identified as 
close galaxy pairs, so an algorithm was introduced to remove these, 
and the size and magnitude cuts were also gradually adjusted over 
several iterations to improve stability. 

3.4.3 Other methods not tested m this paper 

Im2shape ( Bridle et al. '200 ij) performs a similar PSF deconvolu- 
tion, but parametrizes each galaxy and each PSF as a sum of ellip- 
tical Gaussians. The best-fit parameters are obtained via a Markov- 
Chain Monte-Carlo sampling technique. Concentric Gaussians are 
usually used for the galaxies, in which case the ellipticity is then a 
direct measure of the shear via equations Q and J2}. For alterna- 
tive galaxy models using non-concentric Gaussians, shear estima- 
tors like that of the JB method could also be adopted. The "active" 
or "passive" classification of this method is somewhat open to in- 
terpretation. 



4 RESULTS 

Individual authors downloaded the simulated images and ran their 
own shear measurement algorithms, mimicking as closely as pos- 
sible the procedure they would have followed with real data. None 
of the authors knew the input shears at this stage. Their galaxy cat- 
alogues were then compiled by Catherine Heymans and Richard 
Massey. Independently of the other authors, the mean shears in each 
image were compared to the input values. Galaxies in the measured 
catalogues were also matched to their rotated counterparts and to 
objects in the input catalogues, with a 1" tolerance. Except for de- 
termining false detections or stellar contamination in the measured 
catalogues (which were removed in the matched catalogues), no 
results using the input shapes are presented in this paper. 

In this section, we present low level data from the analyses, 
in terms of direct observables. For further discussion and interpre- 
tation of the results in terms of variables concerning global survey 
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PSF model from TS implementation of KSB+ o 06 

image set pluxjiaDIUS si so. 



A 


0.334" 


-(0.68±0.10)% 


(1.21±0.07)% 


B 


0.334" 


-(0.66±0.07)% 


(1.28±0.05)% 


C 


0.406" 


-(0.47±0.07)% 


(0.97±0.06)% 


D 


0.390" 


(11.49±0.11)% 


(2.20±0.14)% 


E 


0.390" 


-(2.21±0.14)% 


(11.29±0.16)% 


F 


0.392" 


-(0.01±0.12)% 


(0.01±0.01)% 



Table 5. PSF models for the six sets of images used in the STEP2 anal- 
ysis by the TS implementation of KSB+, averaged over stars in the sim- 
ulated images. These quantities may be more familiar to some readers. 
FLUX_RAD1US is directly from SExttractor, and the ellipticities are all 
measured using a Gaussian weight function of rms size rg = 0.6" = 
3pixels. 

and instrumental performance, see [js] To conserve space, only a 
representative sample of the many results are displayed here. The 
rest is described in the text, in relation to the illustrative examples, 
and is also available from the STEP websitJ^. First, we shall de- 
scribe the measurement of stars; then the number density of galax- 
ies and then shears in each set of images. Finally, we shall split the 
galaxy catalogues by objects' observed sizes and magnitudes. 

4.1 PSF modelling 

The first task for all shear measurement methods is to identify stars 
and measure the shape of the PSF. Table |5| lists parameters of the 
PSF model generated by the TS implementation of KSB-I-. These 
quantities are more familiar than those derived analytically from 
the shapelet models, and also demonstrate the differences between 
measured PSF ellipticities and inputs described in tableQ The few 
percent polarisations measured for components of PSFs D and E 
that should be zero are typical of several other methods. These may 
explain the peculiar residual shear offsets described in i|5.3l 

4.2 Galaxy number counts and the false detection rate 

The methods used a variety of object detection algorithms and cata- 
logue selection criteria. For each method and each PSF, table|6|lists 
the density of objects per square arcminute, rigais, their mean mag- 
nitude, and the percentage of false detections. Clearly, methods that 
are able to successfully measure the shapes of more (fainter) galax- 
ies, while avoiding false detections, will obtain a stronger measure- 
ment of weak lensing, especially because the lensing signal grows 
cumulatively with galaxy redshift. The false detection and stellar 
contamination rate is generally low, and the effective survey depth 
is lowered by less than 0. 1 magnitudes for all methods after match- 
ing rotated and unrotated catalogues. Nor does matching have a 
significant effect upon the overall mean polarisation of galaxies, 
which is always consistent with zero both before and after match- 
ing - a s might not have been the case in the presence o f selection 
effects jBernstein & Jarvi s'2002':'Hi rata & Seliakl2003h . 

Table |6| also shows the measured dispersion of shear estima- 
tors for each population. This statistic represents a combina- 
tion of the intrinsic ellipticity of galaxies and the shape measure- 
ment/PSF correction noise introduced by each method. Lower val- 
ues will produce stronger measurements of weak lensing. Since 
shear measurement is more difficult for smaller or fainter galax- 
ies, and the intrinsic morphology distribution of galaxies varies as 
a function of magnitude in images other than set B, Jigais and 
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Figure 4. An example of the input vs measured shear for one representa- 
tive method. This is for the first component of shear measured by the KK 
method in image set F. It is neither the best method on this image set, nor 
the best image set for this method, but shows behaviour that is typical of 
most. The grey squares and diamonds show results from independent anal- 
yses of the rotated and unrotated images; the black circles show the effect 
of matching pairs of otherwise identical galaxies. The bottom panel shows 
deviations from perfect shear recovery, which is indicated in both panels by 
solid lines. Linear fits to the data are shown as dashed lines. The fitted pa- 
rameters m (shear calibration bias) and c (residual shear offset) are plotted 
for all methods and all for all images sets in figurelsl 

are likely to be coiTelated in a complicated fashion. Galaxy selec- 
tion effects and weighting schemes are discussed in !l5.6l and il5.7l 

4.3 Shear calibration bias and residual shear offset 



As with STEPl, we assess the success of each method by com- 
paring the mean shear measured in each image with the known in- 
put shears 7^°''"*. We quantify deviations from perfect shear re- 
covery via a linear fit that incorporates a multiplicative "calibration 
bias" m and an additive "residual shear offset" c. With a perfect 
shear measurement method, both of these quantities would be zero. 



STEP2: High precision weak lensing analyses 



13 



Author 


Image 


Offals 




mean mag 


% mag 


(T7 




set 


original / matched 


(original) 


decrease 


original / matched 


JB 


A 


37(0) 


25 


24.04 


1.2 


0.012 


0.007 


C 


28(1) 


21 


23.50 


1.0 


0.014 


0.008 


CI 


A 


51(2) 


45 


23.70 


0.3 


0.008 


0.003 


C 


46(2) 


40 


23.64 


0.4 


0.009 


0.003 


C2 


A 


50(2) 


45 


23.70 


0.3 


0.008 


0.003 


C 


45(2) 


40 


23.64 


0.4 


0.009 


0.003 


MH 


A 


38(0) 


35 


23.68 


0.4 


0.008 


0.003 


C 


33(0) 


29 


23.56 


0.5 


0.009 


0.004 


HH 


A 


28(0) 


26 


23.05 


0.2 


0.010 


0.002 


C 


24(0) 


21 


22.97 


0.3 


0.012 


0.002 


MJ 


A 


27(1) 


24 


23.30 


0.3 


0.009 


0.003 


C 


25(0) 


22 


23.26 


0.4 


0.009 


0.003 


MJ2 


A 


27(1) 


24 


22.58 


0.1 


0.014 


0.002 


C 


25(0) 


22 


22.48 


0.2 


0.016 


0.002 


KK 


A 


32(0) 


26 


23.46 


0.5 


0.009 


0.003 


C 


27(0) 


21 


23.35 


0.5 


0.010 


0.003 


RM 


A 


36(0) 


32 


23.41 


0.3 


0.009 


0.002 


C 


27(0) 


23 


23.21 


0.4 


0.010 


0.003 


RN 


A 


22(1) 


19 


23.10 


0.3 


0.009 


0.003 


C 


16(1) 


13 


23.03 


0.5 


0.011 


0.004 


SP 


A 


27(11) 


15 


23.13 


0.4 


0.014 


0.003 


C 


25(10) 


13 


23.10 


0.4 


0.016 


0.004 


MSI 


A 


43(1) 


39 


23.68 


0.3 


0.007 


0.003 


C 


37(1) 


33 


23.55 


0.3 


0.008 


0.003 


MS2 


A 


41(1) 


36 


23.46 


0.1 


0.010 


0.004 


C 


35(1) 


30 


23.26 


0.1 


0.013 


0.006 


TS 


A 


40(0) 


36 


23.74 


0.5 


0.008 


0.004 


C 


34(0) 


29 


23.64 


0.6 


0.010 


0.005 


ESI 


A 


40(0) 


34 


23.81 


0.6 


0.008 


0.003 


C 


35(0) 


30 


23.71 


0.7 


0.008 


0.003 


ES2 


A 


40(0) 


34 


23.74 


0.6 


0.016 


0.009 


C 


35(0) 


30 


23.69 


0.7 


0.017 


0.009 



Table 6. Number density of galaxies used by each method, and the shear measurement noise from those galaxies. The number of galaxies per square arcminute 
are listed for the unmatched um'otated/rotated catalogues and after matching. The number in brackets is the percentage of stars or false detections 



Since the input shear is now applied in random directions, we mea- 
sure two components each of m and c, which correspond to the two 
components of shear, 

/ ~ \ input input . 

\7i) - 7i = "miJi + ci 

/ ~ \ input input , /■An\ 

(72) - 72 = '^272 + C2 • (43) 

An illustrative example of one typical measurement of the first 
component of shear is shown in figure |4] The grey points corre- 
spond to sets of rotated and unrotated galaxies, and are explained 
in i|4.4l In this example, the negative slope of the black dashed line 
in the bottom panel (mi) shows that this method systematically un- 
derestimates shear by ~ 2.5%. However, the negligible y-intercept 
shows that the PSF was successfully corrected and no residual shear 
calibration (ci) remained. The measurement of the second compo- 
nent of shear is not shown. Note that the range of input shear val- 
ues is smaller than STEPl and, in this weak shear regime, none 
of the methods exhibit the non-linear response to shear seen with 
the strong signals in STEPl. We therefore do not attempt to fit a 
quadratic function to any of the shear in vs shear out results. 

4.4 Combining rotated and unrotated galaxies 

An important advance in this second STEP project is the simul- 
taneous analysis of galaxies that had been rotated by 90° before 
the application of shear and convolution with the PSF. This can 



largely remove noise due to scatter in galaxies' intrinsic morphol- 
ogy, but complicates the production of a joint shear catalogue, es- 
pecially where the galaxies are given different weights in the two 
catalogues. 

Taking the rotated and unrotated sets of images individually, 
we obtain two sets of mean shear estimators (^"'"■°*) and (7"^°'), 
which are defined in equations <25t and <26> . We typically find that 
mr* ~ mr'°' and c^' ~ ~cr'°\ Such stability to changes in 
image rotation is to be expected: cross-talk between ellipticity and 
shear directions are second order in 7 according to equation J2j' 
and the mean ellipticity is overwhelmingly dominated by the in- 
trinsic ellipticities of a finite number of galaxies (as demonstrated 
by the offset between the squares and diamonds in figure|4j. Intru- 
igingly, for the MS 1 and MS2 methods, the shear calibration bias 
changes significantly between the rotated and the unrotated cata- 
logues, and when the two are matched. These methods use smaller 
galaxies than most, including some 10-25% around or below the 
stellar locus on a size vs magnitude plane, and this effect may be 
caused by instabilities in the PSF correction of the smallest. As an 
alternative explanation, there are also second-order effects inherent 
in the non-linear lensing equation that involve the dot product of 
ellipticity and shear, which would become significant in the pres- 
ence of an ellipticity-dependent selection bias. However, we do not 
understand why this would affect only this pipeline and not others. 
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Figure 5. Fitted values of residual shear offset and shear calibration bias for each method and for each PSF. In all cases, the left hand panel shows results 
for the 71 component of shear, and the right hand panel for the 72 component. The dotted lines show rms en'ors after a combined analysis of the rotated and 
unrotated galaxies, after the two catalogues have been matched (and only common detections kept). The solid lines show the reduced errors after removing... 
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Figurelslfcontinued). ...intrinsic galaxy shape noise from the matched the pairs of galaxies. Note that the scales on each panel are different, but the 
frequency of the axis labels is preserved. The red points correspond to image set A. The black points correspond to image set B, and, where available, the 
filled black circles reproduce results from STEPl. The pink, dark blue, light blue and green points correspond to image sets C, D, E and F respectively. 
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Figure 6. Comparison of shear measurement accuracy from different meth- 
ods, in terms of their mean residual shear offset (c) and mean shear cali- 
bration bias {m). In the top panel, these parameters have been averaged 
over both components of shear and all six sets of images; the bottom panel 
includes only image sets A, B, C and F, to avoid the two highly elliptical 
PSFs. Note that the entire region of these plots lie inside the grey band that 
indicated good performance for methods in figure 3 of STEPl. The results 
from methods CI, SP, MSI and ESI are not shown here. 



We have not attempted to investigate this isolated effect in more 
detail. 

We obtain a third set of parameters rrii and Ci from the 
matched catalogue with {7) defined in equation <27> . In general, 
we find that m, ~ {mT"°^ + mT)/2 and c, ~ 0^'°' - C 



with significantly smaller errors in this matched analysis. An ex- 
ample of all three shear estimators for the KK method on image set 



F are plotted in figure |4| The fitted parameters for all of the shear 
measurement methods, on all of the PSFs, are shown in figure |5] 
Parameters measured from the matched pair analysis are also tabu- 
lated in the appendix. Results from the most successful methods are 
averaged across all of the sets of simulated images and compared 
directly in figure|6| 

4.5 Analysis as a function of galaxy population 

It is possible to measure the mean shear correctly from a large pop- 
ulation of galaxies, but to underestimate the shears in some and 
overestimate it in others. This was frequently found to be the case 
in STEP2 data as a function of galaxy size or magnitude, but corre- 
lations could also be present as a function of galaxy morphological 
type. Anything that correlates with galaxy redshift is particularly 
important, and figure shows the correlation of shear calibration 
bias and residual shear offset with galaxy size and magnitude for 
an illustrative selection of shear measurement methods. Of course, 
these proxies are not absolute: the fundamental parameters of inter- 
est are the size of galaxies relative to the pixel or PSF size, and the 
flux of galaxies relative to the image noise level. This must be taken 
into account before drawing parallel conclusions on data sets from 
shallower surveys or those taken in different observing conditions. 

The results for the TS method are fairly representative of most 
implementations of KSB-I-. The calibration bias changes by 0.2- 
0.3 between bright and faint galaxies. The mean shear calibration 
bias changes between methods by merely raising or lowering this 
curve. The ES2 curve is least affected, with only a ~ 5% change. 
The shear calibration bias also generally changes as a function of 
galaxy size. The HH method controls this the best, no doubt due to 
its fitting of P'^ as a function of size only. However, this method 
still displays significant variation as a function of magnitude; it is 
not clear in figure^because the final point expands the y-axis scale. 
The fairly constant residual shear offset as a function of galaxy 
magnitude is typical; as is the dramatic improvements for bigger 
galaxies in the image sets D and E with highly elliptical PSFs. That 
demonstrates that it is a PSF-correction problem. The RM method 
behaves similarly to the implementations of KSB-I-. 

Other methods exhibit more idiosyncratic behaviour. The 
main difference is between the KK method and the others that use a 
global shear responsivity TZ. This was calculated only once for the 
KK method, from the entire galaxy population. For the other meth- 
ods, it was recalculated using a subset of galaxies for each size and 
magnitude bin. The large trends in the shear calibration bias as a 
function of size and magnitude merely reflect the evolving distri- 
bution of intrinsic galaxy ellipticities. The MJ, MJ2, RM and RN 
methods also all look like this with a single value of TZ, and the 
KK method would presumably be improved by this step. The JB 
results are atypical, but their additional noise level represents that 
in all analyses lacking an optimal galaxy weighting scheme. 



5 INTERPRETATION 

We shall now revisit the questions posed in the introduction, con- 
cerning the accuracy with which current methods can measure 
shear, and in which regimes that accuracy begins to deteriorate. 
By noting the variation of results with different PSFs, we shall in- 
vestigate the effects of changing atmospheric and observing con- 
ditions. We shall also investigate the effects of image pixellisation, 
galaxy morphology and morphology evolution, selection biases and 
weighting effects. In light of our results, we shall then review the 
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Figure 7. Variation in sliear calibration bias and residual shear offset as a function of galaxy magnitude and size, for a representative sample of methods. 
The input values of these ai'e used, which do not have noise. The "size" on the abscissae is the unweighted rms size of galaxies from equation (53) in 
iMassev ^Refregier i.200i . The six coloured lines in each plot con'espond to the six sets of images, coloured in the same way as in figure 151 In all cases, 
measurements of the two components of shear have been averaged. 
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consequences for previously published measurements of cosmic 
shear. 

The rotated pairs of galaxies provide an unprecedented level of 
discriminatory power, and we can now identify high level causes of 
shear measurement error. Overall, both the shear calibration (mul- 
tiplicative) bias and anisotropic PSF correction (additive) errors de- 
pend upon the PSF model. From this information, we can deduce 
that some aspects of shape measurement have been suitably con- 
trolled. We can deduce that others still provide difficulty, and it is 
work in these identified areas that will provide a route to the de- 
sired sub-percent level of precision. This section describes various 
lessons that we have learned from our tests, in terms of high level 
variables. 



5.1 PSF size 

Within the precision accessible by this analysis, all of the methods 
are reassuringly tolerant to reasonable changes in observing condi- 
tions. Image set A (0.6" FWHM PSF) represents typical seeing at a 
good site, and image set C (0.8" FWHM PSF) the worst that might 
be expected for a weak lensing survey after appropriate telescope 
scheduling. 

Differences in the residual shear offsets between the two sets 
of images with different seeing are generally not significant. The 
few methods with a significant difference are JB, MH, KK and ES. 
In all four cases, the 2-3(t offset is in ci but not C2. The two KSB-l- 
methods have a positive offset, and the two shapelets methods have 
a negative one, but no general conclusion seems manifest. 

As expected, most methods demonstrate minimal shear cali- 
bration bias with image set A, and fare slightly worse on image 
set C. Shear calibration bias for the JB and RN methods is stable 
to changes in observing conditions at the ~ 0.5% level. The MH 
KSB+ method achieves ~ 1% consistency, although its applied 
shear calibration factor is apparently a little overzealous. 

No global trends emerge that are able to include all of the 
KSB-F methods. However, for the generally most successful KSB-l- 
implementations by MH, HH and TS, as well as the BJ02 (MJ, 
MJ2) methods, m is higher in image set C than in set A. These 
methods are all on the top row of table|3| and correct for the PSF by 
subtracting combinations of shape moments. The trend is reversed 
in the KK deconvolution method on the bottom row, and the cali- 
bration bias does not vary in the JB and RN methods. These correct 
for the PSF via a full deconvolution. Although all implementations 
of KSB-l- do not necessarily fit this trend, it does suggest that the 
isotropic component of the PSF might be being overcorrected by 
some moment subtraction schemes. Furthermore, as the PSF mo- 
ments get larger, this oversubtraction exaggerates pixellisation ef- 
fects (see H5.3> . The best PSF correction is generally attained by 
methods that model the full PSF and attempt to deconvolve each 
galaxy - but this currently works on slightly fewer galaxies (see 

El- 

5.2 PSF ellipticity (and skewness) 

Image sets D and E demonstrate the ability of methods to cor- 
rect for highly elliptical PSFs, and can be compared to image set 
F, which has a circularly symmetric PSF. Imperfect correction for 
PSF anisotropy will emerge mainly as a residual additive shear off- 
set, c. The method that was most efficient at removing all the dif- 
ferent strengths of PSF anisotropy to better than 0.2% accuracy 
was MJ/MJ2, and all of the PSF deconvolution methods had better 



than 1% accuracy. The most successful KSB-l- correction was the 
HH implementation. The residual shear offsets are smallest with 
large galaxies, and deteriorate only as galaxies get smaller. This 
behaviour is as expected if the problems are caused by imperfect 
PSF correction. 

Many methods have a spurious residual shear offset in both 
components of shear, while the PSF is highly elliptical in only the 
ei or £2 direction. This cross-contamination might come from the 
ignored off-diagonal elements of the tensor in KSB-l-, and is 
indeed slightly better controlled in MS2 (with the full tensor inver- 
sion) than in MSI. However, this can not explain all of the effect; 
the off-diagonal elements are exactly zero for the circular PSF in 
image set F, and a few methods (JB, CI, RN, SP, MSI, ES2) have 
a significantly non-zero residual shear offset for even this set of 
images. 

A more likely source of the contamination lies in the mea- 
surement of stellar ellipticities. The non-zero residual shear offsets 
with image set F probably come from shot noise in the measure- 
ment of PSF ellipticity, which is higher than the shot noise for 
galaxies because of the smaller number of stars. It will therefore 
be worthwhile to make sure that future methods gather the maxi- 
mum possible amount of information about the PSF. In particular, 
small galaxies provide as much information about the PSF as their 
own shapes, and this is currently discarded. Furthermore, PSFs D 
and E are not only highly elliptical, but also skewed. The centre 
of those PSFs therefore depends strongly on the size of the weight 
function used. While the main direction of ellipticity is not in doubt, 
changing the centre of the PSF also perturbs its apparent ellipticity. 
The C 1 method, with a fixed stellar weight function and a constant 
PSF model, removes stellar ellipticity more consistently that the C2 
method, in which the size of the stellar weight function is altered to 
match each galaxy (although matching the galaxy weight function 
provides a better shear calibration). Methods that involve decon- 
volution from a full model of the PSF, or correction of PSF non- 
Gaussianity, and which allow the galaxy centroid to iterate during 
this process, do indeed seem to be able to better control PSF ellip- 
ticity and centroiding errors. 

We cannot conclusively explain the cross-contamination of 
both shear components by a PSF strongly elongated in only one 
direction, but hypothesise that it is introduced by skewness and sub- 
structure in the PSF. Neither of these are addressed by the formal- 
ism of KSB-I-, and they are both controlled more reliably by newer 
methods that explicitly allow such variation. However, it is also 
worth noticing the remarkable success of most methods on other 
image sets with more typical PSF ellipticities, and remarking that 
this is still a small effect that will not dominate shear measurement 
for the near future. 

Our investigation of PSF effects in the STEP2 images is con- 
fused by other competing manifestations of imperfect shear mea- 
surement, and the realism of the simulations. The combination of 
image pixellisation (see i)5.3t . correlated galaxy sizes and magni- 
tudes, and the evolution of intrinsic galaxy size and morphology 
as a function of redshift all hinder interpretation. Higher precision 
tests in the future will counterintuitively require less realistic simu- 
lated images: for example, ones that are tailored to compare other- 
wise identical galaxies at fixed multiples of the PSF size. 

5.3 Pixellisation effects 

This is the first STEP project in which the input shear has been ap- 
plied in many directions, and in which the two components of shear 
can be measured independently. In general, residual shear offsets c 
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are consistent between components. However, we find that the 71 
component, aligned with the square pixel grid, is typically mea- 
sured more accurately than the 72 component, along the diagonals. 
This is even observed for image set F, in which the analytic PSF 
is circularly symmetric. Since there is no other preferred direction, 
this phenomenon must therefore be an effect of pixellisation. Im- 
age pixellisation, which is similar (but not identical) to convolution, 
slightly circularises galaxies, thereby reducing their ellipticity. Not 
explicitly correcting for pixellisation may therefore explain both 
the general 1 — 3% underestimation of 71, and the slightly larger 
underestimation of 72, in which direction the distance between pix- 
els is exaggerated. For almost all methods, we consistently find that 
mi > m2. 

In KSB-I-, there is no formal mathematical framework to 
deal with image pixellisation. Two different approaches have been 
adopted to approximate the integrals in equation <20t with pixel- 
lated data. The CI and C2 implementations calculate the value 
of the weight functions at the centre of each pixel and then form 
a discrete sum; all of the others numerically integrate the weight 
functions by subdividing pixels into a number of smaller regions. 
Neither approach is ideal. Independent experiments by Tim Schrab- 
back, running objects with Gaussian radial profiles though his im- 
plementation of KSB-I-, have shown that pixellisation can cause a 
systematic underestimation of e and , and an overestimation of 
P^^. This effect can be up to ~ 10% for small objects. However, as 
stars and faint galaxies are similarly affected, the error on the shear 
estimate approximately cancels. Integration using linearly interpo- 
lated sub-pixels makes the measurement more stable to the sub- 
pixel positi on of the obje ct centroid, but slightly increases the indi- 
vidual bias. lBacon et al\ i2001") tested a variant of the CI method, 
and found a similar ~ 13% overall calibration bias, which was used 
to correct subsequent measurements. With hindsight, the different 
calibration of 71 and 72 are also already visible in that work. 

The MJ2, KK and TS methods are least affected by pixellisa- 
tion. This might have suggested that the extraction of a shear esti- 
mator by shearing circular objects removes the problem, were it not 
for the peculiar behaviour of the RN method. For this method, im- 
age sets A and C follow the usual pattern that mi > m2, but that 
bias is reversed when PSF is circular (image set F and the zero- 
ellipticity components of PSFs D and E). The SP method is similar. 
Strangely, the JB method, which ostensibly tries the hardest to treat 
pixellisation with mathematical rigour, displays the most difference 
between mi and m2. However, this method does break a trend by 
not having an overall negative shear calibration bias. If this bias is 
indeed caused by pixellisation, this method appears to have most 
successfully eliminated it. 

Pixellisation could also hinder shear measurement, and bring 
about the observed results, via two additional mechanisms. Firstly, 
it may exaggerate astrometric errors in the PSF, and produce the 
consequences described in the previous section. We would be un- 
able to distinguish these effects. Secondly, the undersampling of 
objects may also fundamentally prevent the measurement of their 
high order shape moments. All of the STEP2 PSFs (and hence the 
galaxies) are Nyquist sampled. It would be unfortunate for lens- 
ing if Nyquist sampling were theoretically sufficient to measure as- 
trometry, but not shapes. As it happens, for methods other than MJ, 
the pixellisation bias is more pronounced for image set C (with poor 
seeing, and therefore better sampled) than on image set A (with 
good seeing). This suggests that the pixellisation effects are not due 
to undersampling. The STEPl simulations had the same pixel scale 
but worse seeing (~ 1" FWHM), so objects were better sampled 
there. 



We therefore hypothesise that the circularising effects of pixel- 
lisation explain the general underestimation of shear and the differ- 
ential calibration of the 71 and 72 components. Indeed, a dedicated 
study of simulated images with varying pixel scales by High et al. 
(in preparation) supports this view. They find that the shear calibra- 
tion bias of the RRG method tends to zero with infinitely small pix- 
els, grows linearly with pixel scale, and that the bias m2 ~ \/2mi . 
Because of the isotropy of the Universe, this differential calibra- 
tion of shear estimators ought not affect two-point cosmic shear 
statistics. But it can certainly affect the reconstruction of individ- 
ual cluster mass distributions, and is inherently quite disconcert- 
ing. The next STEP project will feature sets of images with varying 
pixel scales to investigate this effect on a wider scale. In the mean 
time, dealing properly with pixellisation will provide a promising 
direction for further improvement in shear measurement methods. 

5.4 Galaxy morphology 

The introduction of complex galaxy morphologies tends to hinder 
shear measurement with KSB-l- methods. The shear calibration bias 
is more negative with image set A (shapelet galaxies) than with 
image set B (simple galaxies) for the CI, C2, MH, SR MSI, TS and 
ESI implementations. Of the implementations of KSB-l-, only HH 
and MS2 reverse this trend. This is perhaps not surprising, given 
the inherent limitation of KSB-l- in assuming that the ellipticity of a 
galaxy does not change as a function of radius. 

Many of the newer methods deal with complex galaxy mor- 
phologies very successfully. Particularly KK, but also the MJ and 
MJ2 methods, have no significant difference in the shear calibra- 
tion bias or residual shear offset measured between image sets A 
and B. Future ground-based shear surveys are therefore unlikely 
to be limited at the 0.5% level by complex galaxy morphologies. 
Indeed, it is apparent in figure |2| that most of the substructure in 
galaxies that will be used for lensing analyis is destroyed by the 
atmospheric seeing. Although complex galaxy morphologies may 
become important at the level of a few tenths of a percent, they do 
not currently pose a dominant source of error or instability in shear 
measurement from the ground. 

One of the crucial findings of this study, however, concerns 
the effect of galaxy morphology evolution. This could potentially 
affect the calibration of shear measurement as a function of galaxy 
redshift, and is investigated further in the next section. 

In the next STEP project, which will simulate space-based ob- 
servations, we shall repeat our investigation of galaxy morphology 
by comparing three similar sets of image simulations. Galaxy sub- 
structure will be better resolved from space and, because the galax- 
ies observed there are likely to be at a higher redshift, their intrinsic 
morphologies may be both more irregular and more rapidly evolv- 
ing. Both of these effects will amplify any differences seen from 
the ground. 

5.5 Shear calibration for different galaxy populations 

The STEP2 results reveal that the calibration bias of some shear 
measurement methods depends upon the size and magnitude of 
galaxies. There seem to be two causes. There is often a sudden 
~ 30% deterioration of performance at very faint magnitudes, due 
to being noise blown up during the nonlinear process of shear mea- 
surement (and exacerbated by ellipticity-dependent galaxy weight- 
ing schemes). This is even observed with many methods that are 
otherwise robust (e.g. HH, MJ2, RN), and may urge more caution 



20 Massey et al. 



in the use of faint galaxies at the limits of detection. There is also 
a gradual transition in shear calibration between bright and faint 
galaxies that is probably caused by evolution of the intrinsic mor- 
phology distribution as a function of redshift. The observed vari- 
ation is least pronounced for image set B, in which the galaxies 
explicitly do not evolve. 

Shear calibration bias that changes gradually as a function of 
galaxy redshift has important consequences for any weak lensing 
measurement. In a 2D survey, it will change the effective redshift 
di stribution of source galaxies, with all the consequences discussed 
by I Van Waerbeke et al\ i2006h . In a 3D analysis, it will affect the 
perceived redshift evolution of the matter power spectrum, and the 
apparent large-scale geometry of the universe. During the STEP2 
analysis, we have developed ways to partially control this, as a 
function of other observables like galaxy size and magnitude. To 
first order, these act as suitable proxies for redshift, but the under- 
lying causes will need to be well understood, because neither of 
these are redshift. Even if the mean shear in size/magnitude bins 
could be made correct, this doesn't necessarily imply that the mean 
shear would be correct in redshift bins. The techniques could be ap- 
plied in multicolour surveys as a function of photometric redshift, 
but this is not perfect either, not least because of the inevitable pres- 
ence of catastrophic photo-2: failures. 

The obvious place to start looking for shear calibration errors 
is in the shear susceptibility and responsivity factors. All of the 
KSB-l- implementations allow variation in P~' as a function of at 
least one of galaxy size and galaxy magnitude. However, the be- 
haviour is n either well understoo d, nor stable at the desired level 
of precision. iMassev et al\ j200^ have already observed that P^' 
fitted from a population ensemble varies for any given object as a 
function of the catalogue selection cuts. There is less variation in 
the shear calibration bias of the MSI method (Am « 0.1), which 
fits only the trace of P^, than of the MS2 method (Am ■p^ 0.2), 
which models the entire tensor - except for image set B, in which 
there is little variation in either. Realistic galaxy morphologies 
therefore do not have shear susceptibility that is a simple func- 
tions of these observables; and trying to model the variation of all 
the components of this tensor merely adds noise. The TS imple- 
mentation of KSB-I-, which uses P~' from individual objects, suf- 
fers particularly from this noise, which enters into the denominator 
equation <18t . and has at least as much sudden deterioration at faint 
magnitudes as other methods. However, this method is about the 
least affected by gradual variation in shear calibration bias, with 
Am ~ 0.05. Size galaxy size and magnitude are correlated, the 
variation with galaxy magnitude usually carries over to variation 
with galaxy size. However, the HH method has notably little varia- 
tion in m as a function of galaxy size. This is presumably due to the 
particularly individual form of the function used to model P~'{rg). 
Unfortunately, P"' is not fitted as a function of galaxy magnitude, 
and the HH method still shows strong (Am ~ 0.1) variation with 
this. The shear susceptibility in this implementation is calculated 
separately in three magnitude bins, and correction of the faintest 
galaxies therefore required an extrapolation. 

Many of the other shear measurement methods require global 
calibration via a responsivity TZ factor, which is determined from 
the distribution of galaxy ellipticities. This factor is designed to 
ensure that the mean shear in a population is unbiased. However, 
it must be calculated from precisely that population. For the KK 
method, it was calculated only once, from the entire catalogue. 
Although it estimated the overall mean shear correctly, it then 
underestimated the shear in small/faint galaxies, and overestimated 
that in large/bright galaxies. This bias was addressed for the 



MJ, MJ2, RM and RN methods by recalculating TZ within each 
size and magnitude bin. There is no particular reason why this 
should not, in future, be fitted and allowed to vary continuously 
like the shear susceptibility in KSB-l- methods. The estimates of 
TZ in bins were more noisy, but removed the differential shear 
calibration (in fact, the variation as a function of galaxy magnitude 
was slightly overcorrected in the case of the MJ2 and RM methods). 



5.6 Galaxy selection effects 

There is a marked difference between the depth of the various 
galaxy catalogues. At one extreme, the C1/C2 catalogues are 
deeper, and more ambitious, than all others. At the other, the RN 
catalogue (and to some extent the MJ/MJ2 catalogue) is very shal- 
low. The RN method obtained extremely good results, but only 
from large and bright galaxies, and it would be interesting to test 
whether its PSF deconvolution iteration can converge with a deeper 
sample. The IB catalogue of individual rotated and unrotated im- 
ages is deeper, but not all of the galaxies at the magnitude limit 
converged successfully, leading to a relatively shallow matched cat- 
alogue. We could conclude from this that the full deconvolution of 
every galaxy is an overly ambitious goal: it is a panacea for many 
image analysis problems, but all that we require is one shear estima- 
tor. Maximising the number density of useable galaxies will remain 
crucial in the near future, to overcome noise from their intrinsic el- 
lipticities. However, there has been far less time spent developing 
the deconvolution methods than the moment subtraction methods, 
so we reserve judgement for now because of their promise of ro- 
bust PSF correction. Furthermore, it is not only the methods that 
require complicated iterations that suffer from catalogue shortcom- 
ings: the SP catalogue includes a significant number of spurious 
detections (10%) and stars (1%). Neither of these contain any shear 
signal, and their presence partly explains the large, negative cali- 
bration bias of the SP method in the rotated and unrotated images 
(they are removed during the galaxy matching). 

Most other methods use a fairly standard density of ~ 30 
galaxies per square arcminute in this simulated data. This is 
unlikely to be increased dramatically by any future weak lensing 
observations. Since selection effects in the STEP2 analysis must 
be measured from the individual unrotated and rotated catalogues, 
rather than the matched catalogues, the results about catalogue 
selection biases are hardly more profound than those of STEPl. 



5.7 Galaxy weighting schemes 

The weighting schemes applied to galaxies also vary significantly 
between methods used in this paper, and these do affect the results 
in the matched catalogue. Most of the methods increase the con- 
tribution to the estimated mean shear from those galaxies whose 
shapes are thought to be most accurately measured. Such schemes 
have long been used in the analysis of real 2D data, but the ex- 
act form of the weighting scheme as a function of size, magnitude 
and ellipticity varies widely. Even more sophisticated weighting 
schemes will also need to be developed for the 3D analyses es- 
sential to fully exploit future weak lensing surveys. 

In this analysis, the effectiveness of each weighting scheme 
can be seen in the difference between the size of error bars in the 
analysis of independent galaxies and of rotated/unrotated pairs of 
matched galaxies. In the independent analysis, the scatter includes 
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Figure 8. Comparison of shear measurement in real CFHTLS deep data, 
from a g alaxy-by-galaxy compari son of matched catalogues from the ES 1 
analysis iSemboloni et Q/.ll2006ah and a reanalysis using the HH method. 
The relative calibration of both components of shear are indistinguishable, 
and are here included in the same plot. A slope of unity would imply per- 
fect agreement. The dashed line indicates the relative calibration of the two 
methods in simulated image set C, which is the most closely matched to ac- 
tual observing conditions. Although this should not be regarded as a strict 
prediction, since there are many image parameters that are not matched, its 
agreement with the real data is striking. 
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Figure 9. Comparison of shear measurement in real CFHTLS deep data, 
as a function of galaxy size and magnitude. The relative shear calibration 
of the ESI and HH methods is obtained from the ratio of the mean shear 
calculated in 3' X 3' subfields of each CFHTLS deep field. A value of unity 
would imply perfect agreement between the catalogues. Note that we have 
reconciled the different definitions of galaxy size in the simulations com- 
pared to real data by approximating Kk, Vg. We have dealt with the differ- 
ent relationship between galaxy magnitude and signal-to-noise (c.f. i|4.5i by 
offseting the magnitudes of objects in the deeper simulated data by - 1 . The 
grey band indicates the relative calibration of the two methods in simulated 
image set C, which is the most closely matched to the CFHTLS data. 



components from intrinsic galaxy shapes and measurement noise 
(e.g. due to photon shot noise). The former is essentially removed 
by matching pairs of galaxies. If a set of error bars shrink dramat- 
ically by matching, the method was dominated by intrinsic galaxy 
shapes: this is an ideal situation. If the error bars change little, the 
measurement was dominated by measurement noise. 

The weighting schemes of MJ2 and KK are very effective in 
this analysis: their error bars shrink by up to 75%. The weighting 
schemes of HH, SP and MJ are similarly effective ~ but these meth- 
ods weight ellipticities using a function of ellipticity, which may be 
less accurate in regimes where the mean shear is large, such as clus- 
ter mass reconstruction. Indeed, the aggressive weighting scheme 
of MJ was shown in STEPl to be useful with small input shears, 
but introduced a non-linear shear response that became important 
if the shear was high. A new weighting scheme was developed for 
MJ2 to address this concern; however, the range of input shears in 
STEP2 does not provide sufficient lever arm to evaluate the poten- 
tial nonlinear response of any method. 

The value of a successful weighting scheme is demonstrated 
by the lesser performance of methods without one. The JB, TS and 
ES2 methods apply crude weighting schemes that are merely a step 
function (cut) in galaxy size and magnitude. Their error bars shrink 
by only 30-50% during galaxy matching. Their results are also less 
stable to the sudden deterioration of performance seen in several 
methods with galaxies fainter than or smaller than a particular limit. 
This shortfall is easy to correct, and we urge the rapid adoption of 
a more sophisticated weighting scheme in those methods. 

It is important to remember the limitations of the STEP sim- 



ulations to optimise a galaxy weighting scheme, because of their 
inherent simplification that all galaxies are sheared by the same 
amount. In real data, the lensing signal increases cumulatively with 
redshift, and the distant galaxies therefore contain the most valu- 
able signal. However, when weighting objects by the accuracy of 
their shape measurement, it is the contribution of these small, faint 
sources that is usually downweighted. It would instead be better to 
set weights that vary as a function of the signal to noise in shear 
signal - although the exact variation of the signal is of course un- 
known in advance. A statistically "optimal" weighting scheme ver- 
ified from the STEP simulations will therefore not be optimal in 
practice. Weighting schemes can also act like calibration biases as 
a function of galaxy redshift, exacerbating the problems of differ- 
ential shear calibration discussed in the previous section. 

5.8 Consequences for previously published measurements 

The largest cosmic shear survey to date, which has been published 
since STEPl, comes from the Canada-France-Hawaii Telescope 
Leg acy Suney (CFHTLS ) i-band data. The CFHTLS wide sur- 
vey faoekstraef'^l2006ft was analysed using th e HH shear mea- 
sureme nt method, and the CFHTLS deep survey ISemboloni et al\ 
l2006£l) using the ES 1 method. These methods perform very differ- 
ently on the simulated images. 

The HH method recovers shear in the STEP2 images with re- 
markable success. The seeing in the CFHTLS data is most simi- 
lar to that in image set C, for which the overall shear calibration 
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Figure 10. Comparison of shear-shear correlation functions measured from real CFHTLS deep survey data, after HH (squares) and ES 1 (ciixles) analyses. The 
correlation functions are split into E- and B-modes in two different ways: the variance of the shear in cells is shown on the left as a function of cell radius, 
and the variance of the mass aperture statistic is shown on the right. In both cases, the solid points show the i?-mode, and the open points the B-mode. The 
eiTor bars show statistical eiTors only (i.e. no account is made for cosmic variance since the survey region is identical), but note that the difference between the 
two data sets is in fact more significant than indicated, because the same galaxies are used in each analysis, so noise enters only from the shape measurement 
process and not from variation in intrinsic galaxy ellipticities. In the lower panels, the points show the ratio of the i?-modes calculated from the two analyses, 
and the lines show the ratio of the i?-modes plus B-modes. The grey bands indicate the relative calibration of the two methods in simulated image set C, which 
is the most closely matched to actual observing conditions. 



is with in 1%: well within the current error budget. 'Hoek stra et al\ 
i200d) also featured a parallel analysis using an independent KSB+ 
pipeline, which agreed with the HH results, and also demonstrates 
the potential robustness of KSB+ at this lev el of precision (simi- 
lar comparisons have also be en performed bv lMassev et al\ i2005h 
and ISchrabback et al\ i2006h . and these also give results consis- 
tent with that work). The HH method had difficulty only with the 
calibration of very faint galaxies, due to its non-smooth fitting of 
P~' as a function of magnitude. If a similar bias is present in the 
CFHTLS analysis, it will have lowered the effective redshift dis- 
tribution of source galaxies, and slightly diluted the overall signal. 
Both of these effects would have led to an underestimation of erg, 
although only by a small amo unt, due to the low weight given to 
faint galaxies. As discussed bv lVan Waerbeke et all Hw^ . SL more 
significant bias (which acts in the opposite sense) arises from using 
the Hubble Deep Field to infer the redshift distribution of galaxies. 
As the survey area of the CFHTLS grows, and the statistical error 
bars decrease, it may be prudent for this analysis to conservatively 
use slightly fewer galaxies. 

The ESI method underestimates shear in the STEP2 images 
by 20% overall, and by as much as 30% for the faintest galaxies. 
We have verified this result retrospectively in STEPl simulations, 
and also confirmed it in real images, by comparing the results of the 
HH and ESI shear measurement pipelines on the same CFHTLS 



deep data. Of course, the true "input" shear is not known for real 
data. Figure[8|shows the relative calibration of the two methods in 
real data, with the dashed line indicating their relative calibration 
in simulated image set C. This should not be interpreted as a strict 
prediction, since the simulation was not designed to mimic this spe- 
cific survey: the simulated and real data have very different noise 
properties, and the only similarity between their PSFs is their size. 
Nonetheless, the agreement is impressive. Figure|9|shows a further 
comparison of the methods' relative calibration, in which galax- 
ies have been split by size and magnitude. Once again, overlaying 
the performance of ES 1 from image set C confirms the results of 
the STEP simulations with remarkable success. A likely source of 
the shear calibration bias is in the smoothing of as a function 
of Tg and magnitude. Tests indicate that the shear susceptibility is 
more stable if it is instead fitted as a smooth function of size and 
magnitude, or even by using the raw values. The strong magnitude 
dependence is probably related to the sudden drop at small sizes. 
Note also that both pipelines started from scratch with the individ- 
ual exposures, reducing them and stacking them independently. All 
of the available exposures are stacked in both versions, so the two 
sets of images have effectively the same depth. The full data re- 
duction pipeline of both groups is being tested, and the differences 
could therefore have been introduced at any stage. 

Figure IIQI shows the two-point correlation functions of the 
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matched shear catalogues (using the weights of the individual cat- 
alogues), which are normally used to constrain cosmological pa- 
rameters at the end of a weak lensing analysis. Although the ESI 
analysis consistently measures a lower signal than the HH analy- 
sis, the discrepancy is not uniform on all scales. The relative bias is 
most pronounced on small scales when measuring the variance of 
the aperture mass statistic, and on both small and large scales for 
the shear variance in cells. Such variation is not seen in the galaxy- 
by-galaxy comparison of relative shear calibration. For example, 
the signal in figure|9|is stable to changes in the size of the area over 
which the shears are averaged. 

We hypothesise that there may therefore be an additional 
source of bias in the ESI CFHTLS analysis, due to PSF anisotropy 
residuals. Since the PSF anisotropy varies spatially, the residual 
would average out across the survey, and not affect the overall bias. 
The correlation functions were calculated using the procedure in 
IVan W aerbeke et al. ( 2005), which deals with an unknown constant 
of integration in the calculation of cr^ [9) by forcing the B-modes 
of to zero on large scales. This prior on the B-modes can add spuri- 
ous power to the i?-modes, and could have artificially re-raised the 
cosmic shear signal. Indeed, the ratio of the sum of the E- and B- 
modes between analyses is flatter than that of the i?-modes alone. 
Furthe rmore, the star-star correlation functions iSemboloni et al\ 
l2006iil show an excess before PSF correction, on similar scales to 
that observed in the left-hand panel of fieure fTol 

A naive correction for a 20% shear calibration bias in the 
CFHTLS deep survey jSemboloni et al. 2006jp would raise the 
measured value of erg almost proportionally. This would remain 
within the estimated error bud get for the lensi ng analysis due to 
non-Gaussian cosmic variance iSemboloni et a l. 2006b), but adds 
tension t o an existing discrep ancy with the three year results from 
WMAP JSoereel et a/.l200el) . In practice, a more sophisticated re- 
calibration will probably be required. If our hypothesis of an ad- 
ditional systematic is correct, this would have partially cancelled 
the shear calibration bias. Judging by the ratio of the observed cor- 
relation functions, the net underestimation of ag could have been 
around 10-15%. More work is needed to test this hypothesis; but it 
is beyond the scope of this paper. A full reanalysis of the CFHTLS 
survey, including the latest data, will therefore follow. 

The striking confirmation of the STEP results on real data 
demonstrates the success of our simulation project, and highlights 
the vital role that artificial images will play in the exploitation of 
future surveys. Ideally, they ought not be relied upon for simple 
empirical recalibration, but they will be essential to verify the per- 
formance of methods derived from first principles. The STEP im- 
ages remain publicly available to test future weak lensing analy- 
ses. Simultaneously, the complexity of our correlation functions 
results also highlight the importance of subtleties in weak shear 
measurement that may arise only within the complex environment 
of real observational data. To fully understand such effects, we shall 
pursue further development of the dataSTYLV projeci-', an ongoing 
comparison of the output from various shear measurement methods 
on a common sample of real data. 



6 CONCLUSIONS 

Performance has improved since STEPl, and the STEP project con- 
tinues to drive progress and innovation in shear measurement meth- 
ods. The most accurate methods, with better than ~ 2% level cal- 
ibration errors for most of the tested observing conditions, were 
the MJ2 implementation of B J02, the TS and HH implementations 



of KSB-I-, the KK and JB implementations of shapelets and the 
RM implementation of Reglens. Particular advances are apparent 
in methods that used the results of STEPl to tune their algorithms, 
which bodes well for the future of this project. For example, the 
introduction of a calibration factor to the TS method has proved 
reassuringly robust with our new, more realistic simulated images. 
We have also verified the STEP results on real data, finding striking 
confirmation of methods' relative shear calibration in the CFHTLS 
deep survey. 

There is no one shear measurement method that is doing ev- 
erything best. With the increased precision possible in this analy- 
sis, we can now distinguish all of the methods from perfect per- 
formance. Since absolute shear calibration can not be directly as- 
certained from real data, this remains the most important issue. 
The calibration bias in most methods leads to a slight underesti- 
mation of shear. Both the shear calibration (multiplicative) errors 
and anisotropic PSF correction (additive) errors are also found to 
depend upon characteristics of the PSF. Technical advances in indi- 
vidual methods will therefore still be required. Ideally, one would 
attempt to take the most successful aspect of several methods and 
combine them. The fundamentally different approaches to the two 
main tasks in shear measurement make this difficult, but there is 
common ground (e.g. object detection algorithms, the shapelet ba- 
sis functions, and galaxy weighting schemes), so the individual 
lessons learned with each method may not necessarily be irrecon- 
cilable. To this end, we have developed a classification scheme for 
shear measurement methods, and have described all existing meth- 
ods in a common language so that their similarities and differences 
are apparent. Development is continuing in earnest. 

We have used our improved simulations to identify various as- 
pects of shear measurement that have been effectively solved at the 
current level of precision. We have also uncovered other, specific 
areas that remain problematic. Studying these may provide a route 
to the most rapid technological advances. Development needs to be 
focussed towards: 

• Pixellisation 

• Correlated background noise 

• PSF measurement 

• Galaxy morphology evolution. 

These four points are explained below. 

This is the first STEP project in which the input shear has 
been applied in arbitrary directions relative to the pixel grid. That 
this direction affects the calibration of shear measurement methods, 
even for images with a circular PSF and no other preferred direc- 
tion, implies that pixellisation is not fully controlled. Pixel effects 
may also explain the general tendency of methods to underestimate 
shear. Since no explicit provision is made for pixellisation in many 
methods, this result is not surprising. This work has quantified just 
how much of an effect it has, and thereby emphasised the impor- 
tance of a proper treatment in the future. High et al. (in preparation) 
are specifically investigating pixellisation through tailor-made im- 
age simulations with varying pixel scales. 

Although not all data sets have background noise that is sig- 
nificantly correlated between adjacent pixels, it is particularly ap- 
parent in natively undersampled data, for which several exposures 
dithered by sub-pixel shifts must be co-added. The introduction of 
correlated background noise to the STEP2 simulations hindered 
several methods: during the detection of faint objects, the mod- 
elling of objects to a specified fidelity, and the weighting of individ- 
ual shear estimators. Now that this issue has been raised, work is 
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underway in the context of several of the shear measurement meth- 
ods. 

Various schemes have been developed to improve PSF inter- 
polation across a field of view | Hoekstra 2 004: Jarvis & Jain 20Q4i), 
but some methods seem to be having trouble with the initial mea- 
surements of the PSF from individual stars. The measurement of 
the shape of each star affects shear estimates from many galaxies, 
and is therefore of vital importance. When the PSF is highly ellip- 
tical, this work has revealed some peculiar residual shear offsets, 
in the directions orthogonal (at 45°) to that ellipticity. We have not 
yet found a satisfactory explanation for this, but speculate that it 
might be caused by difficulties measuring the centroid and the el- 
lipticity of stars that have substructure, skewness, and no single, 
well-defined ellipticity. Methods that model the full PSF, and espe- 
ciaUy those that attempt PSF deconvolution, are less affected, but at 
the expense of a having smaller number density of useable galaxies 
for which the complicated deconvolution algorithms currently con- 
verge. This issue will require further investigation, and questions 
about the residual shears cannot be addressed until this is resolved. 

Issues of galaxy morphology evolution become particularly 
important for those methods whose calibration relies on the 
overall distribution of galaxies' intrinsic ellipticities. High redshift 
galaxies are both more elliptical and more inegular; and evolution 
in the ellipticity variance directly affects the shear calibration. 
For a 2D cosmic shear survey, even if the mean shear is correctly 
measured, this can bias the effective redshift distribution of source 
galaxies and the geometrical interpre tation of the lensing signal , 
with all the consequences discussed in lVan Waerbeke et a/.N200a) . 
For a 3D analysis, it can change the apparent redshift evolution of 
the signal and hence the apparent cosmological matter distribution. 

The next STEP project will analyse a set of simulated space- 
based images. With their higher spatial resolution, we expect that 
variation in galaxy morphology will more profoundly affect shear 
measurement. We will therefore repeat the exercise of comparing 
the analysis of complex shapelet galaxies with more idealised ob- 
jects, and also separate the galaxy populations by morphological 
class. The cuspy space-based PSFs will provide a different (easier) 
regime in which to test centering, and we shall explicitly avoid PSF 
interpolation errors by allowing methods to assume that the PSF is 
constant. This should make interpretation easier Background noise 
will also be left intentionally uncorrelated. However, variations in 
the pixel scale will be introduced, to specifically test methods' ro- 
bustness to pixellisation effects. 

Such ongoing improvements are vital to the success of gravi- 
tational lensing as a viable probe of cosmology. Although the mea- 
surement of weak lensing is not limited by unknown physical pro- 
cesses, the technical aspect of galaxy shape measurement at such 
high precision remains computationally challenging. In this paper, 
we have demonstrated that simulated images can drive progress in 
this field, and can provide a robust test of shear measurement on 
real data. Previous cosmic shear measurements would have ben- 
efitted from access to STEP, and the future exploitation of dedi- 
cated surveys relies upon the development of methods that are be- 
ing tested here first. Both the tools and the collective will are now 
in place to meet this challenge. The STEP simulations remain pub- 
licly available, and the weak lensing community is progressing to 
the next level of technical refinement in a spirit of open coopera- 
tion. We conclude with the hope that, by accessing the shared tech- 
nical knowledge compiled by the STEP projects, all future shear 
measurement methods will be able to reliably and accurately mea- 
sure weak lensing shear 
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-0.65±1.11 


-1.13±1.04 
-3.67±0.99 


-0.99±1.04 
-6.17±1.26 


-0.39±1.14 
-4.20±1.22 


RN 


-2.28±1.27 
-4.85±1.21 


-0.79±1.16 
-3.04±0.96 


-4.16±1.57 
-6.55±1.48 


-3.52±1.33 
-5.26±1.28 


-3.90±1.35 
-7.68±1.66 


-6.20±1.46 
-6.18±1.53 


SP 


-10.52±1.25 


-7.52±1.40 

'\ 4Q-I-1 ^ ] 


-12.60±1.49 


-12.67±1.55 

-J.ODIE i . JO 


-14.41±1.34 
-y.ozm I .o / 


-12.20±1.44 
-yj.y i ^ i .DU 


MSI 


-15.19±1.15 
-15.79±1.11 


-13.40±1.00 
-12.76±0.85 


-22.79±1.30 
-21.68±1.24 


-11.85±1.22 
-11.92±1.19 


-15.45±1.25 
-19.01±1.45 


-13.93±1.29 
-14.87±1.56 


MS2 


-3.40±1.75 
-2.94±1.75 


-8.09±1.30 
-4.18±1.19 


-12.55±2.31 
-6.55±2.21 


-0.70±2.08 
5.13±2.07 


-0.68±1.97 
-11.98±2.61 


-1.99±2.10 
-1.70±2.40 


TS 


-1.43±1.47 
-0.97±1.38 


2.82±1.57 
1.88±1.30 


0.26±1.87 
-2.54±1.67 


-2.76±1.55 
-1.11±1.56 


-3.69±1.58 
-7.81±1.98 


-2.04±1.74 
-2.60±1.79 


ESI 


-15.51±1.27 
-18.07±1.21 


-8.11±1.29 
-8.02±1.06 


-19.03±1.34 
-21.05±1.19 


-19.09±1.26 
-19.65±1.17 


-17.31±1.26 
-20.60±1.60 


-12.45±1.45 
-16.80±1.51 


ES2 


13.66±3.28 
4.61±3.10 


11.68±3.34 
14.64±2.70 


-1.36±3.47 
-4.93±3.20 


3.03±2.97 
3.10±2.73 


1.06±2.85 
-3.82±3.61 


3.00±3.47 
-7.25±3.74 



Table 1. Tabulated values of shear calibration bias (x 10 ^) from figure|5] In each entry, the top line refers to the first component of shear, and the bottom line 
to the second. 
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Author 


Image set A 


Image set B 


Image set C 


Image set D 


Image set E 


Image set F 


JB 


-6.8±6.5 
1.3±6.6 


-17.2±5.4 
-15.0±5.5 


-34.5±7.5 
-1.0±7.5 


24.5±7.6 
-80.3±7.6 


83.7±8.0 
46.5±8.0 


17.4±7.3 
10.6±7.5 


CI 


21.2±2.5 
21.2±2.5 


26.7±2.5 
-5.4±2.6 


-5.2±3.2 
23.2±3.1 


124.2±2.6 
-70.0±2.5 


64.1 ±2.9 
130.2±2.9 


-11.8±2.9 
8.5±2.9 


C2 


-3.3±2.5 
38.3±2.6 


-1.1±2.5 
18.8±2.6 


-21.6±3.3 
39.8±3.2 


259.4±2.9 
-36.6±2.9 


29.7±3.2 
276.6±3.2 


-6.2±2.9 
3.6±3.0 


MH 


10.2±3.0 
5.4±3.0 


19.8±3.3 
21.9±3.3 


19.6±3.9 
6.7±4.0 


101.2±3.4 
-84.2±3.6 


91.6±3.8 
99.3±3.8 


-4.4±3.6 
6.3±3.6 


HH 


1.6±1.8 
-4.6±1.8 


-4.8±1.8 
-3.1±1.8 


-6.1±2.3 
-0.6±2.2 


3.7±2.0 
-65.5±2.0 


75.2±2.2 
-5.9±2.1 


-2.2±2.1 
9.8±2.1 


MJ 


-11.8±2.5 
-0.9±2.6 


-9.5±2.2 
6.0±2.1 


-6.5±3.2 
1.6±3.1 


18.2±2.9 
-12.7±2.8 


13.8±3.1 
16.6±3.0 


-2.2±2.8 
1.2±2.8 


MJ2 


-10.3±1.9 
1.5±1.9 


-4.8±1.7 
3.1±1.7 


0.4±2.3 
2.3±2.2 


23.9±1.9 
-16.8±2.0 


15.5±2.2 
19.7±2.1 


-0.8±1.9 
1.2±1.9 


KK 


-2.1±2.4 
-2.7±2.4 


-5.2±2.7 
-6.6±2.7 


-14.0±3.1 
2.1±3.0 


-71.6±2.8 
-69.5±2.7 


66.6±3.0 
-56.9±2.9 


0.1±2.8 
-3.9±2.8 


RM 


22.9±2.2 
-9.9±2.2 


14.9±2.0 
-3.1±1.9 


26.5±2.9 
-5.8±2.8 


-33.5±2.5 
-105.7±2.5 


112.0±2.8 
-19.4±2.7 


0.1 ±2.6 
2.4±2.7 


RN 


-5.3±2.8 
1.8±2.7 


-5.0±2.5 
-0.1±2.5 


-6.3±3.8 
8.9±3.7 


-34.9±3.1 
-33.1±3.2 


43.1±3.4 
-26.8±3.3 


2.5±3.1 
4.6±3.2 


SP 


-1.1±2.5 
-1.1±2.7 


-3.4±2.9 
-7.6±3.2 


-4.5±3.3 
-4.6±3.8 


-69.9±3.3 
-55.3±3.6 


71.6±3.4 
-13.3±3.5 


5.5±3.0 
4.1±3.3 


MSI 


-J.DltZ. J 

10.3±2.6 


-2z. Jitz. i 
23.1±2.1 


A Qj-i n 

4.yit j.z 
7.2±3.2 


iUj. lit J.U 
-45.7±3.0 


jiS.Jit J. J 

83.8±3.1 


-/.Oltj.i 

6.1±3.2 


MS2 


-7.9±3.9 
14.4±4.0 


-21.3±2.8 
24.3±3.0 


3.2±5.6 
19.5±5.9 


140.5±5.0 
-28.7±5.1 


41.5±5.2 
154.4±5.4 


-0.4±4.9 
9.2±5.3 


TS 


-2.9±3.3 
-3.0±3.2 


-4.3±3.5 
-1.3±3.6 


2.7±4.5 
0.4±4.4 


-46.2±3.9 
-65.4±3.9 


70.3±4.3 
-40.3±4.2 


-3.5±4.0 
-3.1±4.0 


ESI 


-9.1±2.8 
4.0±2.8 


-4.1 ±2.9 
8.7±2.8 


5.7±3.3 
9.9±3.2 


153.1±3.1 
-58.7±3.0 


54.3±3.4 
132.0±3.4 


-5.5±3.4 
0.4±3.3 


ES2 


-11.0±7.4 
-11.2±7.4 


8.5±7.4 
-3.3±7.2 


15.0±8.2 
5.7±8.4 


95.3±7.1 
-92.9±7.1 


96.7±7.8 
11.9±1.1 


-10.4±8.2 
7.7±8.1 



Table 2. Tabulated values of residual shear offset ( X 10 ^) from figure|5] In each entry, the top hne refers to the first component of shear, and the bottom line 
to the second. 



